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SECTION  I 

INTRODUCTION  AND  SUMMARY 


1. 1 Overview 

The  task  of  signal  processing  in  advanced  avionic  weapon  sys- 
tems is  to  abstract  megabits-per-second  sensor  information  into 
a manageable  basis  for  decision  making.  Reduction  of  system  size, 
cost  and  power,  and  increase  of  reliability  requires  exploiting  the 
latest  technology.  Operating  mode  flexibility  is  equally  import- 
ant for  extended  visibility  into  the  environment,  for  perfor- 
mance improvement,  for  faster  reaction  time,  or  for  lower  ECM 
vulnerability . 

This  study  defines  a micro  signal  processor  design  based  on 
maximum  use  of  "off-the-shelf"  IC  functions,  supplemented  by  spe- 
cial personalizations  of  standard  arrays.  A top-down  analysis 
was  employed  and  involved  the  following  tasks; 

1)  A functional  analysis  of  signal  processing 
tasks  at  the  algorithm  and  processing  task 
level 

2)  A performance  analysis  of  representative 
tasks 

3)  A state-of-the-art  review  of  industry  LSI 
micro  processor  development 

4)  A detailed  functional  definition  of  the 
hardware  and  software  aspects  of  micro 
signal  processor  circuit  elements 

5)  A simulation  of  the  approaches  defined 
above 

6)  A circuit  technology  review  to  isolate 
trends , and 

7)  A development  plan 


An  outline  of  the  analysis  results  in  presented  in  Table  1 
A simplistic  view  of  the  micro  signal  processor  parts  is  presen 
ted  in  Figure  1.  Sections  1 and  2 of  this  report  will  delve 
into  the  nature  of  these  elements  in  more  detail. 
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Figure  1 - Micro  Signal  Processor  Parts 

A typical  application  for  the  /iSP  will  be  a minimum  con- 
figuration with  only  one  of  each  type  element.  Such  an  applica- 
tion arises  in  the  usual  system  growth  shown  in  Figure  2,  from 
no  specialized  signal  processor, to  recognizing  the  cost-effec- 
tiveness of  a ^SP.  First,  all  signal  processing  is  done  by  a GP 
computer.  Second,  a hardwired  FFT  box  gets  introduced.  Third, 
a couple  of  batch  sizes  are  employed,  necessitating  a load-while 
processing  type  buffer.  From  there  and  a few  more  system  re- 
quirement changes  a /iSP  is  justifyable. 

More  powerful  signal  processing  systems  are  also  pos 
sible  with  these  same  set  of  building  blocks.  Figure  3a  shows 
one  possible  scheme  for  netting  both  subsystems  and  systems.  A 
key  concept  developed  in  this  study  is  the  signal  processor  fam- 
ily capability  for  this  design.  Famil / features  are  summarized 
in  Table  2.  The  potential  thus  exists  for  a compatible  Maxi 
signal  processor  such  a Figure  3b. 
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TABLE  2 

SIGNAL  PROCESSOR  FAMILY  CONCEPT 

BASIC  INSTRUCTION  SET  PLUS  EXPANSION  SET 
DATA  & I/O  FORMAT  COMPATIBILITY 
COMMON  SUPPORT  SOFTWARE 

BUILDING  BLOCKS  (ELEMENTS)  CONFIGURABLE 
FOR  RANGE  OF  THRUPUTS:  MICRO  TO  MAXI 

PROCESSORS  CAN  NET  IN  PARALLEL  AND/OR 
SERIAL 

MEMORY  SPEED  VS  DENSITY  EXPLOITABLE 

IMPLEMENTABLE  WITH  TODAY'S  AND  TOMOR- 
ROW'S LSI: 

• COMMERCIAL,  ARRAYS,  CUSTOM 

• LSTTL,  CMOS -SOS,  ECL,  . . . 
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Figure  3a  - Netting  Subsystems  & Systems 
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Figure  3b  - Maxi  Signal  Processor 


1. 2 ttSP  Characteristics  Sununary 


The  /iSP  elements  can  be  grouped  into  Control,  Memory,  and 
Arithmetic  type  elements.  Table  3 lists  the  key  functions  per- 
formed by  each  group  of  elements.  These  elements  are  stackable 
into  a variety  of  configurations  because  of  the  standardization 
applied  to  data  transfer  timing.  As  shown  in  Figure  4,  the 
unidirectional  data  buses  transfer  data  on  a four  cycle  timing 
basis.  These  four  slots  transfer  either  two  complex  words  or 
two  double  length  words  or  one  of  each.  The  timing  to  a memory 
element  or  arithmetic  element  may  differ  because  of  the  pipelined 
processing  of  data,  but  only  by  an  integer  multiple  of  4 clocks. 
Hence,  the  order  of  element  placement  is  restricted  by  what  makes 
computational  sense,  rather  than  implementation  peculiarity. 

Such  time-sharing  of  one  bus  is  also  very  pin  efficient,  both  in 
inter  element  connections  and  in  buffer  storage  pins. 

The  control  is  composed  of  two  elements,  a sequencer  and 
address  generator.  The  sequencer  element  takes  care  of  program 
branching,  while  the  address  generator  provides  both  read  and 
write  address  for  the  various  types  of  memories.  Some  of  their 
key  features  are  shown  in  Table  4.  Also  shown  in  Table  4 is  a 
summary  of  the  arithmetic  pipeline  characteristics.  it  is  com- 
posed of  three  elements,  illustrated  in  Figure  5. 

Figure  6 shows  the  standard  interconnection  of  the  sys- 
tem elements,  including  sequencer,  address  generator,  local  memory, 
bulk  memory,  coefficient  memory  and  pipeline  stages  1,  2, and  3. 
There  is  a 16  bit  I/O  bus  which  communicates  to  other  ^P's  or 
/iSP's.  There  is  a pipeline  input  bus  and  a pipeline  output  bus, 
each  of  which  is  12  bits  and  transfers  data  over  4 clock  cycles 
during  one  macro  cycle.  The  address  bus  to  the  coefficient  and 
bulk  memories  also  serves  as  a bidirectional  data  bus  to  connect 
with  the  GP  I/O,  and  for  load-while-processing  operation. 
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stage  2 implements  data  routing  and  multiply. 

Stage  3 has  a pair  of  adders  and  a shift  select  to  do 
complex,  double  precision  arithmetic  and  sorting. 


TABLE  4 

fiSP  HARDWARE  ELEMENTS  FEATURES 
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Figure  5 - uSP  Arithmetic  Pipeline 
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A fiSP  system  can  also  be  configured  without  a bulk  memory 

element.  Figure  7 shows  such  a configuration,  where  a signal 
processor  data  to  GP  computer  path  is  possible  through  the  co- 
efficient memory.  This  alternative  provides  for  the  very  small- 
est system  configuration,  where  minimum  cost  of  total  logic  is 
important. 

Note  that  the  system  elements  have  been  defined  to  minimize 
the  number  of  pins.  For  example,  two  unidirectional  buses  of  12 
bits  are  used  to  and  from  the  pipeline  rather  than  one  24  bit 
bidirectional  bus  to  reduce  the  number  of  pins  to  the  pipeline 
stages.  All  elements  shall  fit  within  a 64  pin  interconnection, 
allowing  convenient  size  hybridization.  Further  when  LSI  densi- 
ties permit,  elements  may  be  combined  without  increasing  pin  to- 
tals. An  example  is  merging  pipeline  stages  2 and  3 or  even 
stages  1,  2 and  3. 

The  actual  element  control  has  been  chosen  for  straight- 
forwardness rather  than  minimization  of  numbers  of  control  bits. 
Thus  the  address  generator's  memory,  for  example,  is  a little 
over  16  bits  wide.  This  simplifies  the  decoding  logic,  adds  to 
the  intelligibility  of  the  designs,  and  allows  for  later  optim- 
ization with  a particular  LSI  technology  implementation.  We  ex- 
pect that  as  more  experience  is  gathered  with  the  /iSP  design, 
reductions  will  suggest  themselves.  For  example,  the  sequencer 
element  has  dropped  the  operations  "DEL"  and  "PLP"  as  being  too 
specialized,  peculiar  to  the  8X02  sequencer  IC,  and  unnecessary 
to  typical  programs. 

A hierachy  of  software  exists  when  applying  the  ^SP  for  a 
given  mission.  Each  mission,  such  as  ground  mapping,  has  several 
modes.  Those  modes  are  composed  of  common  algorithms  such  as 
pulse  compression  or  FIR  filtering.  These  algorithms  are  in  turn 
composed  of  macro  commands  such  as  FFT  "butterfly"  or  pole-pair 
calculation.  The  actual  micro  bits  that  control  adders  and  mult- 
ipliers are  really  invisible  to  the  user.  In  essence,  the  macro 
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commands  are  the  instruction  set  for  this  /iSP.  Table  5 summar- 
izes the  capabilities  for  the  fiSP  firmware/software.  Some  times 
associated  with  various  algorithms  are  presented  in  Table  6. 

These  times  are  based  on  readily  available  LSTTL  implementation 
using  a 150  ns  clock.  Higher  speed  technology  implementation 
would  be  scaled  accordingly. 

As  part  of  this  study,  an  assembler  and  simulator  effort 
was  undertaken.  Effort  was  concentrated  on  the  control  areas, 
because  with  the  pressure  of  time,  these  represented  critical 
areas  for  validating  /iSP  efficiency.  Code  was  generated  in  For- 
tran on  the  CDC  CYBER  73  system.  Coding  was  based  on  an  instruc- 
tion Set  Processor  (ISP)  description  for  the  sequencer  and  ad- 
dress generator.  More  details  can  be  found  in  the  Simulation 
User's  Manual  (Raytheon  BR-9632) , and  in  chapter  6 of  this  rer 
port.  Figure  8 outlines  the  software  simulation  flow. 

Implementation  of  these*  fiSP  elements  was  seriously  consid- 
ered. The  primary  emphasis  was  on  exploiting  the  available 
semiconductor  industry  LSI,  both  current  and  visible  trends.  Re- 
cognition of  the  limitations  of  relying  on  the  commercial  world 
was  also  considered,  leading  to  recommended  chip  types  whose  de- 
signs are  quite  general  and  yet  significantly  reduce  the  number  of 
total  chips.  Table  7 lists  the  major  chip  types  recommended. 

Table  8 shows  the  parts  count  for  implementing  the  /xSP  with  either 
all  commercial  IC's  today  or  with  the  best  of  available  LSI  and 
using  Raytheon's  300  gate  array  IC's  to  form  the  recommended 
chip  types.  Note  that  the  number  of  parts  per  element  in  each 
group  is  about  the  same,  although  for  different  reasons,  such 
as  memory  bit  width  limitations  or  arithmetic  complexity  limit- 
ation . 

More  detail  on  the'/iSP  elements  is  presented  in  the  next 
chapter,  with  the  backup  for  its  derivation  following  in  Sections 
III,  IV  and  V. 
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fiSP  ALGORITHMS  EXAMPLES 


Figure  8 - Simulation  Flow 


SECTION  II 

ARCHITECTURE  OF  ELEMENTS 


2 . 1 Orientation 

This  section  develops  a computer  architecture  basis  for  the 
design  of  the  micro  signal  processor  elements.  Small  hardware 
increases  in  the  memory,  arithmetic,  control  and  I/O  areas  signi- 
ficantly increase  computation  rate,  thus  increasing  efficiency 
compared  with  classic  mini  computer/micro  processor  architecture. 
We  foresee  that  many  of  the  concepts  postulated  for  this  /iSP  will 
appear  in  future  versions  of  commercial  micro  processors. 

Raw  computing  power  is  only  one  measure  of  the  worth  of  any 
signal  processing  architecture.  This  study  has  analyzed  main- 
stream architecture  concepts  in  detail  to  judge  them  by  quantita- 
tive and  qualitative  measures  including: 

• Hardware  smallness,  showing  effects  of  design  efficien- 
cy with  LSI  logic  and  packaging  technology  choice 

• Hardware  simplicity,  telling  the  degree  of  design  match 
with  off-the-shelf  building  blocks,  and  the  intelligi- 
bility of  the  logic  organization 

• Hardware  assurity,  measuring  implementation  uniqueness 
and  likelihood  of  interfering  with  the  detail  design 

• Software  thruput,  measuring  useful  computer  power 
for  the  algorithm  mix  wanted 

• Software  simplicity,  including  coding  level,  ease  of 
use,  and  ease  of  learning 

• Software  assurity,  measuring  programming  language 
uniqueness,  and  likelihood  of  degrading  the  software 

• Technology  transferability,  a critical  factor  for  mili- 
tary equipment  with  their  long  development  cycles  com- 
pared to  semiconductor  industry  advances 
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• System  efficiency  covers  the  integration  aspects  not 
specifically  included  in  any  of  the  above 

Our  preliminary  judgement  of  some  major  candidate  architec- 


tures is  summarized  in  Figure  9.  For  example,  micro  processors, 
netted  and/or  with  multipliers,  are  shown  to  be  most  desirable 


from  the  standpoint  of  hardware  size  and  cost.  However,  their  ^ 

thruput  and  software  fit  to  signal  processing  are  not  as  good 
as  other  alternatives. 

One  architecture  alternative  shown  in  Figure  9 implements 
each  task  of  a big,  powerful  signal  processor  with  small  byte- 

slices  of  logic.  I 
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Figure  9 - Signal  Processor  Schemes 
(Relative  Advantages  of  Each) 

This  approach  can  theoretically  take  some  optimum  architecture 
and  trade  away  speed  for  reduced  hardware.  Most  of  the  logic  then 
fits  into  two  or  four  bit  LSI  CPU  slices.  The  resulting  small 
number  of  interconnections  often  allows  PROM  or  FPLA  to  replace  a 
collection  of  cascaded  functions.  Raytheon  understands  these 
techniques  because  we  have  used  them  for  fast  (1/iSec/point)  and 
for  small  (38  IC)  FFTs.  End-bit  logic  problems  and  conceptual  com- 
plexity of  byte-slice  design,  however,  appear  to  limit  their  appli- 
cation to  fixed  function  signal  processing.  We  emphasize  that  con- 


cepts  of  largr  signal  processors  must  be  adapted,  not  adopted,  to 
achieve  a /iSP. 

Also  shown  in  Figure  9 is  Raytheon's  Mini  GPSP,  a medium 
size  design  with  many  nice  features,  but  too  large  for  direct 
application  here. 

The  postulated  micro  signal  processor  resulting  from  this 
study  is  also  listed  in  this  same  table  as  being  small  but  not 
smallest;  fitting  available  ICs  to  a great  extent,  but  not  com- 
pletely; and  having  final  hardware  and  software  detailed  by  this 
study. 

GP  computers,  even  those  using  bipolar  LSI  CPU  slices,  are 
dominated  in  size  by  their  general  interfacing  structures.  Add- 
ing a fast  multiplier  speeds  up  a micro  or  mini  computer,  but 
still  leaves  thruput  an  order  of  magnitude  too  slow  for  signal 
processing.  Not  unexpectedly,  signal  processors  are  very  effi- 
cient for  arithmetic-oriented  tasks.  We  claim  that  the  /xSP  has 
more  thruput  for  its  size  by  at  least  a couple  of  powers  of 
two  than  bigger  signal  processors.  Consequently,  large  netted 
^SP  systems  can  be  viable,  opening  up  a wider  application  po- 
tential. 

The  architecture  partitioning  used  successively  in  Raytheon's 
large  and  Mini  programmable  signal  processor  was  determined  to 
be  appropriate  here.  This  partitioning  separates  the  design 
into  elements  as  follows: 

• CONTROL  ELEMENTS; 

• SEQUENCER 

• ADDRESS  GENERATOR 

• MEMORY  ELEMENTS: 

• LOCAL  MEMORY 

• COEFICIENT  MEMORY 

• BULK  MEMORY 

• ARITHMETIC  ELEMENTS 


• SCALING 


• MULTIPLYING 

• ADDITION 


The  designs  for  each  of  the  above  elements  are  derived  in 
the  following  sections. 

2 . 2 Control  and  Interface 

2.2.1  Overview 

The  control  elements  include  a sequencer  and  an  address 
generator.  Tne  purpose  of  the  sequencer  is  to  determine  both  the 

order  of  instruction  execution  and  the  nvimber  of  times  each  in- 
struction is  repeated.  The  purpose  of  the  address  generator  is 
to  generate  the  required  set  of  memory  addresses  for  each  macro 
instruction  cycle.  Communication  to  the  external  command  and 
control  GP  computer  is  routed  through  these  control  elements. 
Communication  includes  loading  programs,  modifying  subroutine 
calling  parameters,  reading  check  point  values  (Build  In  Test 
Equipment  or  BITE) , and  returning  target  locations  and  strengths. 

The  sequencer  element  is  deveolped  in  section  2.2.2.  It 
does  the  bookkeeping  for  nested  macro  loops,  subroutine  linking, 
and  parameter  manipulation.  A fixed  timing  cycle  is  desired  to 
preserve  100%  arithmetic  and  memory  utilization  for  the  ^SP. 

This  desire  conflicts  with  the  utilization  of  standard  GP  type 
components  to  implement  what  is  basically  a GP  type  function. 
Hence  we  have  developed  the  novel  concept  of  using  a FIFO  type 
buffer  interface  between  the  sequencer  and  the  rest  of  the  con- 
trol logic.  This  approach  avoids  the  need  for  complex  look-ahead 
type  seqxiencer  instructions  in  this  sequencer  element  which  is 
otherwise  compatible  with  commercial  micro  processor  sequencers. 


The  resulting  sequencer  design  is  shown  in  block  form  in 
figure  10.  The  simulation  of  the  control  section  is  thus  based 
on  two  processors,  the  sequencer  and  the  address  generator. 
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Figure  10  - fiSP  Sequencer  Block  Diagram 
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which  are  asynchronous  to  each  other.  Analytic  work  was  also 
done  on  the  worst  case  code,  namely  manipulatinq  an  n-dimensional 
matrix  where  each  dimension  is  only  two  words  wide  (such  as  form- 
ing the  bit-reverse  positioning  of  the  FFT  output) . A suf- 
ficient condition  to  insure  100%  utilization  of  machine  thru- 
put  is  that  the  clock  driving  the  sequencer  logic  must  operate 
at  least  three  times  that  of  the  macro  instruction  rate.  Since 
we  use  four  arithmetic  clocks  to  one  macro  instruction,  there 
can  even  be  a surplus  of  sequencer  cycles  available  for  time- 
sharing elsewheres.  The  potential  exists  for  using  this  sur- 
plus to  do  most  of  the  signal  processor  driving  tasks  usually 
handled  by  a separate  GP  computer. 

The  f/SP ' s sequencer  instruction  set  has  been  defined  for 
maximum  ease  of  use  and  to  relate  to  available  "sequencer"  type 
IC's.  The  number  of  distinct  memory  control  and  data  fields  is 
thus  larger  than  necessary.  This  can  be  pruned  down  in  the  next 
phase  when  finalizing  the  design. 

The  sequencer  design  can  accomodate  future  generation  LSI 
IC's.  We  expect  that  all  of  the  sequencer  except  for  the  output 
data  segment  could  be  replaced  by  the  future  bipolar  single  chip 
/IP's.  Aside  from  component  reduction  and  instruction  set  expan- 
sion, this  evolution  provides  potential  for  incorporating  the  GP 
drive  computer  into  part  of  the  /tSP  de'^^gn.  Such  a degree  of  so- 
phistication is  not  provided  for  this  generation  of  /tSP,  although 
the  sequencer's  operation  shall  be  asynchronous  to  the  arithmetic 
function  execution. 

The  address  generator  element  provides  four  distinct  addres 
ses  every  macro  cycle.  These  addresses  are  based  on  eitner:  a) 
initialization  to  a specified  location,  or  b)  advancing  by  a speci 
fied  increment.  One  address  goes  to  the  coefficient  memory,  one 
to  the  bulk  memory  and  two  to  the  local  memories.  The  latter  two 
are  delayed  by  the  arithmetic  pipeline  length  to  provide  a total 
of  two  read  and  two  write  addresses  for  the  local  memories. 

Figure  11  shows  the  block  diagram  for  the  address  generator 
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Figure  11  - Address  Generator  Block  Diagram 
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Note  that  an  address  width  of  16  bits  is  provided  to  the  data 
bulk  and  coefficient  memories.  These  addresses  are  controlled 
by  the  address  generator  pointer  by  way  of  the  address  generator 
microcode.  The  16  bits  allows  for  fractional  addressing  of  co- 
efficient tables  with  less  than  the  full  amount  of  address  space. 
Note  also,  the  use  of  an  address  buffer  memory  that  can  save 
either  a delayed  address  or  a data  component  based  on  an  arith- 
metic pipeline  condition.  This  address  manipulation  capability 
provides  for  data  sorting  within  the  ^SP.  Design  details  are 
found  in  section  II. 

The  control  elements  both  have  an  amount  of  program  mem- 
ory. We  foresee  that  a mixture  of  PROM  and  RAM  can  be  employed 
for  these  purposes.  Subroutines  can  be  coded  in  PROMS  while 
calling  sequences,  including  parameters  to  be  passed,  should  be 
kept  in  RAM.  Some  parameters  can  also  be  passed  by  reserved 
register  convention  in  the  corresponding  elements  RALU's. 

Several  design  implementations  were  generated  for  these 
control  elements  and  comparisons  made  of  speed,  area  and  pin 
totals.  A temporary  edge  in  operating  speed  existed  for  dropping 
down  to  MSI  IC's  in  some  areas.  Hybrid  packaging  is  a strong 
contender  for  achieving  volumetric  efficiency  despite  40  pin  IC's. 
A disturbing  trend  was  the  revelation  that  samples  of  the  newest 
LSI  were  slower  than  promisee!  by  significant  amounts.  The 
potential  of  Raytheon's  300  gate  array  to  make  an  interim  design 
thus  becomes  more  significant. 

2.2.2  Sequencer  Design 

The  sequencer  for  the  ^SP  aims  at  optimizing  speed,  real 
estate,  and  simplicity.  Held  constant  is  a baseline  macro  coding 
technique  compatible  with  Raytheon's  existing  signal  processors. 

The  sequencer  block  defines  the  order  of  execution  and 
the  number  of  repetitions  of  each  arithmetic  pipeline  macro  com- 
mand and  each  address  generator  group  command.  The  sequencer 
function  is  very  similar  to  the  basic  function  performed  by  any 
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of  the  bipolar  microprocessor  sequencer  type  chips  like  the 
54S482,  9408,  and  8X02.  These  chips  however  lack  the  GPSP's 
multiway  jump  mechanism  based  on  counter  status  flags.  There- 
fore arithmetic  execution  cycles  must  be  skipped  some  of  the  time 
in  nested  looping.  Here,  we  propose  connecting  a /iP  type  sequen- 
cer to  the  rest  of  the  uSP  through  a FIFO  buffer.  Only  a pointer 
to  the  arithmetic  unit  is  stored  in  the  FIFO  to  indicate  the  in- 
structions to  be  executed.  Sequencing  without  an  arithmetic  unit 
execution  cycle  occurs  by  not  storing  an  instruction  in  the  FIFO. 

Upon  machine  initializaiion,  sequencer  execution  starts 
to  fill  up  the  buffer  and  soon  gets  ahead  of  the  arithmetic  unit's 
address  generator  execution.  If  the  buffer  is  full,  the  sequen- 
cer's clock  is  shut  off.  When  the  buffer  is  empty,  the  address 
generator’s  clock  is  shut  off.  The  FIFO  allows  two  different 
clock  rates  for  writing  and  reading.  Therefor,  the  se- 
quencer's clock  cycle,  which  affects  loading  of  the  FIFO,  can  be 
slower  than  the  pipeline  clock  cycle,  which  determines  the  un- 
loading rate  of  the  FIFO.  Thus,  simple  slow  sequencer  logic 
works  with  high  speed  arithmetic  pipelining. 

The  postulated  design  provides  for  incorporating  future 
generation  LSI  /iP  parts.  All  of  the  sequencer,  except  a portion 
of  memory  and  the  FIFO  and  its  control,  could  be  replaced  by  the 
coming  high  speed  integrated  ^P's.  Instruction  repertoire  would 
be  expanded  allowing  more  things  to  happen  sooner  and  possibly 
with  easier  coding. 

2. 2. 2.1  Next  Address  Control 

The  macro  instruction  set  postulated  for  the  sequen- 
cer next  address  control  allows  direct  branching  or  subroutine 
handling  and  looping  by  the  program  stack.  The  Branch  Address/ 
Quantity  field  is  12  bits.  Only  10  bits  will  be  used  for  the 
branch  address  label  while  all  12  bits  will  be  available  for  the 
quantity  field  used  for  loading  the  counters.  For  a minimal 

system  this  field  could  be  limited  to  8 bits. 
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The  instructions  are  as 

follows : 

Mnemonics  Description 

Test 

Next  Address 

Stack  Stack  Pointer 

INC 

Increment 

X 

Current  + 1 

N.C. 

N.C. 

BRF 

Test  and  Jump 

True 

Current  + 1 

False 

Branch  Address 

N.C. 

N.C. 

BRU 

Jump 

X 

Branch  Address 

N.C. 

N.C. 

POP 

Pop  and  Jump 

X 

Stack  Top 

POP'ed 

Deer. 

BSR 

Push  and  Jump 

X 

Branch  Address 

Current+1 

Incr. 

2. 2. 2. 2 Counter  Control 

16  RAM  words  act  as  indexes  or  counters,  controlled 
by  three  mnemonics.  The  sequencer  memory  will  also  contain  two 
fields  for  addressing  these  counters;  one  field  will  be  called 
the  main  index  field  and  the  other  the  reload  index  field.  The 
3 instructions  are  as  follows; 

LQ:  Load  main  index  register  from  the  quantity  field 

LI:  Load  main  index  register  from  the  reload  index 
register 

DR:  Decrement,  test  the  result  for  zero,  and  either 

write  the  result  into  the  main  index  register 
if  not  zero  or  write  the  reload  index  register 
into  the  main  index  register  if  zero. 


2. 2. 2. 3 FIFO  Control 

Three  mnemonics  control  the  loading  of  appropriate 
words  into  the  FIFO.  Two  different  fields  are  loaded  into  the 
FIFO;  an  8 bit  AU  macro  label,  and  an  8 bit  address  generator 
pointer  for  a total  of  16  bits  into  the  buffer.  The  3 instruc-  j 

tions  are  as  follows:  * 


AL:  Always  load  the  FIFO 

NV;  Never  load  the  FIFO 

XT;  Always  load  except  when  the  test  is  true 
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2. 2. 2. 4 Sequencer  Memory 


The  sequencer  memory  will  consist  of  a IK  X 43  RAM 
with  structure  as  follows: 


1 

BIT 

0 2 

3 

14 

15  16 

17 

20 

21 

24 

25  26 

27  34 

35  42 

INST 

OP  CODE 

ADR/QUAN 

CNT  CON 

MAIN  IN 

REL  IN 

FIFO  CON 

MICRO 

ADGN 

M 

INC 

LABEL 

LQ 

c 

i 

0 

AL 

LABEL 

LABEL 

N 

E 

BRF 

OR 

LI 

NV 

M 

0 

BRU 

L 

DR 

XT 

N 

V 

> 

\ 

I 

POP 

15 

c 

s 

BSR 

\ 

/ 

4096 

2. 2. 2. 5 Hardware  Coding  Using  Discrete  MSI 

To  accomplish  the  next  address  functions  required, 
a design  using  the  Signetics  8X02  sequencer  has  been  done  (Fig- 


ure  2-4)  . 

The  8X02 's  4 

control  inputs 

relate 

to 

the  op  codes 

chosen  as 

follows . 

TEST 

SS  [ 

OUTPUTS  TO  0X02 

OP  CODE 

CONDITION 

0 

AC 

2 

]_0 

TEST  INPUT 

INC 

D.C. 

0 

0 

0 

0 

0 

1 

X 

POP 

D.C. 

0 

1 

0 

0 

.1 

1 

\' 

BSR 

D.C. 

1 

0 

] 

1 

0 

0 

1 

BRU 

D.C. 

1 

1 

0 

i 

1 

0 

] 

BRF  ^CURR.H  1 

1 

1 

J 

1 

1 

0 

0 

(^BR. 

ADR.  0 

1 

1 

] 

1 

1 

0 

J 

SSI  logic 

generates  the 

above 

inputs  to 

the 

8X02 

from  the  selec- 

ted  bit  patterns  plus  the  Test  Condition. 

The  counter  control  2 bit  op  codes  are  used  to  control 
the  three  state  outputs  of  the  adder,  the  RAM,  and  the  quantity 
field  of  the  main  memory.  The  op  codes  will  be  given  a bit  as- 
sigment  to  produce  minimal  logic,  and  they  will  be  used  to  control 
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the  three  state  outputs  as  follows: 


RAM  ADDER  QUAN  RAM 


OP  CODE 

TEST  CONDITION 

C2  Cl 

OE-A 

G 

G 

OE-B 

LQ 

D.C. 

00 

1 

0 

0 

1 

LI 

D.C. 

10 

0 

1 

1 

0 

DR 

0 

11 

1 

0 

1 

0 

1 

11 

0 

1 

1 

0 

The  FIFO  control  has 

a 2 bit  op 

code 

for  determining 

when  to 

load  the  FIFO  with 

valid 

data.  The 

FIFO 

has  an 

input 

called  shift  in  which  is  used  to  load  the  data.  The  coding  is 
as  follows: 


FIFO 


OP  CODE 

TEST  CONDITION 

F2  FI 

SI 

NV 

D.C. 

01 

0 

don ' t 

load 

AL 

D.C. 

00 

1 

load 

XT 

0 

10 

1 

load 

1 

10 

0 

don ' t 

load 

2. 2. 2. 6 

Timing 

The  DR  instruction  represents  the  worst  case  cycle 
time  since  this  goes  through  a read-add-write  sequence.  There 
are  three  timing  paths  to  consider.  Using  data  obtained  from 
vendor  spec  sheets  the  timing  paths  are  as  follows: 

Counter  RAM  access  + test  NAND  logic  + 8X02  setup  + 
8X02  output  delay  + 256  X 4 RAM  access. 
=30+15+31+34+50 
=160  ns 

Counter  RAM  access  + test  NAND  logic  + tristate  enable 
+ counter  RAM  data  setup 
=30+25+15+30 
=100  ns 
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Counter  RAM  access  + carry  bridge  adder  + tristate  delay 
+ counter  RAM  data  setup. 

= 30+30+22+30 

= 112  ns 

Hence,  the  most  reasonable  cycle  time  is  a clock  period  of  at 
least  160  ns. 

2. 2. 2. 7 Using  the  2901 

An  alternative  approach  to  the  next  address  function 
is  to  use  the  8X02  with  a RAM-adder  combination  (Figure  13) . 

The  control  logic  to  implement  the  OP  CODES  defined  by  S2Sj^Sq  will 

be  the  same  as  before.  The  counter  control  2 bit  op  codes  are  | 

used  to  control  the  2901  as  follows  ; J 


2901  INPUTS 


OP  CODE 

TEST  CONDITION 

^2^1 

■^876 

543 

210 

"o 

LQ 

D.C. 

00 

2 

1 

7 

1 

LI 

D.C. 

10 

2 

0 

3 

0 

DR 

1 

11 

2 

0 

3 

0 

0 

11 

2 

0 

4 

1 

The  RAM  A address  is  considered  the  MAIN  INDEX.  The  RAM  B ad- 
dress is  controlled  by  a 2 ►!  selector  which  during  the  read 

cycle  selects  the  RELOAD  INDEX  and  during  the  write  cycle  selects 
the  MAIN  INDEX. 

The  FIFO  control  decoding  is  the  same  as  in  the  dis- 
crete MSI  design. 

The  DR  instruction  represents  the  worst  case  cycle 
time.  There  are  two  timing  paths  to  consider  which  are  as 
follows : 

Counter  RAM  access  + test  NAND  logic  + 8X02  setup  + 

8X02  output  delay  + 256  X 4 RAM  access 
= 65+15+31+34+50 
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Figure  12  - ^SP  Sequencer  (RAM-ADDER  Configuration) 


2901  Configuration) 


= 195  ns 

Counter  RAM  access  + test  NAND  logic  + ALU  setup  and 
carry  bridge  add  + counter  RAM  data  setup 
= 65+15+135+30 
= 245  ns 

Hence,  the  most  reasonable  cycle  time  is  a clock  period  of  at 
least  245  ns. 

2. 2. 2. 8 Comparison 

The  RAM-ADDER  and  the  2901  configuration  are  now  com- 
pared with  respect  to  IC  totals,  power,  package  area,  and  pin 
count. 

DESIGN  PARTS  POWER  PINS  AREA 


PARTS 

POWER 

PINS 

AREA 

3-29705 

550=1650 

28=84 

.9X4  = 

3.6 

1-8X02 

650=650 

28=28 

.24  X 16  = 

3.8 

3-LS283 

100=300 

16=48 

4-5741 

500=2000 

16=64 

4 -HEX  3-STATE 

325=1300 

16=64 

5-SSI 

100=500 

14=/0 

20  IC 

6.4W 

358  PINS 

7.4  in 

1-8X02 

650=650 

28=28 

1.2  X 3 = 

3 . C. 

4-5741 

500=2000 

16=64 

.9X1  =- 

. 1) 

3-2901 

1000=3000 

40=120 

.24  X 8 = 

1 .9 

4-SSl 

100=400 

14  = 56 

12  IC 

6.0W 

268  PINS 

6.4  in 

2. 2. 2. 9 Sequencer  Outputs 


The  /iSP  Sequencer  produces  4 outputs: 

The  10  bit  address  field  from  the  8X02  which 
is  used  to  address  the  main  sequencer  RAM  memory 
The  8 bit  address  field  from  the  FIFO  used  to 
address  the  micro  memory. 

The  8 bit  address  field  from  the  FIFO  used  to 
address  the  address  generator  memory. 

The  FIFO  empty  signal  which  will  be  used  in  the 
address  generator  memory  control  logic. 
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2.2.2.10  Coding  Examples 


Figure  14  presents  an  example  of  a nested  loop  coded 
on  the  /aSP  sequencer.  Also  listed  is  the  resulting  sequence  of 
macro  instructions  and  the  sequence  of  address  increments. 

Programming  of  a 64  point  complex  FFT  followed  by  a 
bit  reverse  routine  has  been  completed  and  simulated.  This 
routine  takes  28  lines  of  sequencer  code  to  write,  319  steps  to 
execute,  and  258  address  generator/macro  cycles  to  execute.  A- 
side  from  the  initial  conditions,  the  pipeline  clock  worked  all 
the  time,  doing  192  FFT  butterfly  macro's,  64  bit-reverse  macro's 
and  8 pipeline  flush  macro's.  More  efficient  code  could  be 
written,  but  even  this  effort  has  shown  that  the  concept  of  FIFO 
buffering  between  sequencer  and  the  rest  of  the  /iSP  is  a viable 
concept. 

2.2.3  /xSP  Address  Generator 

The  address  generator  creates  the  addresses  for  four  mem- 
ories: data  memory  1,  data  memory  2,  the  coefficient  memory,  and 

the  bulk  memory.  The  address  generator  control  memory  will  be 
up  to  1 K words  of  16  bits,  and  initially  all  RAM.  This  4 cycle 
machine  generates  an  address  for  one  memory  on  each  cycle.  The 
control  memory  thus  has  256  words  for  each  memory.  The  FIFO  from 
the  sequencer  sends  an  8 bit  address  to  the  MSB's  of  the  ADGN 
RAM.  A 2 bit  counter  is  used  to  generate  the  2 LSB’s  to  the  ADGN 

RAM.  Each  8 bit  address  sent  by  the  FIFO  causes  the  counter  to 

generate  4 counts.  Hence,  for  every  4 cycles  the  4 memories  are 
addressed. 

2. 2. 3.1  Instruction  Set 

The  instruction  set  for  creating  the  addresses  will 
consist  of  six  instructions  as  follows. 


* 
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Figure  14  - Example  of  fiSP  Coding 


UNCLASSIFIED 

1.  LQ  - Load  Quantity  into  destination 
(12  bits  plus  4 zeroes) 

Q 12  34  5 7 8 9 10  11  1213  14  15  16 


0 

0 

— 1 1 — r-  1 r 1 1 f — 1 1 — 

QTY 

\ — r— 

DEST 

2.  IL  - Add  increment  to  low  part  of  source  and  put 
into  destination  (extend  sign  of  increment) . 


0 

1 

r— I j 1 1 1 1 1 

INC 

1 1 

SOURCE 

1 — 1 

DEST 

3.  IH  - Add  increment  to  high  part  of  source  and  put 

into  destination  (zero  fill  LSB's  of  increment) 


0 

1 1 1 1 1 1 1 ‘T~-  ■ 

INC 

■■■  ■ ^ 

— 1 1 
SOURCE 

( 1 
DEST 

SS  - Add  source  to  source  2 and  put  into 

destination 

1 

1 

1 

1 

XX  XX 

1 1 

SOURCE 2 

1 1 

SOURCE 

— 1 1 — 

DEST 

5.  LB  - Load  address  buffer  into  destination 


1 

■ 

1 

— 1 1 — 1 1 1 1 1 — 1 1 

1 1 

Li 

1 

E 

XX  XXX  X XXX  X 

DEST 

6.  SB  - Add  source  to  address  buffer  and  put  into  de- 
stination 


1 

1 

0 

1 

• I • 1 1 < 

X X XXX  XX 

r — } 

SOURCE 

— f— — 

DEST 

A register  to  register  move  can  be  accomplished  by  an  increment  U 

of  zero  using  either  instruction  IL  or  IH.  To  add  the  source  to 
the  destination,  instruction  SS  is  used  by  placing  the  destina- 
tion in  the  source  2 field.  The  LQ  instruction  is  used  as  follows. 
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Qq  0 0 0 0 


DEST 


where  is  bit  2 and  is  bit  13  in  the 

instruction. 


The  IL  instruction  is  used  as  follows: 


^15 

Si4  . . .Sg  . . 

• ^0 

Ig  . . .Ig  I^  . . 

• ^0 

°14 

... 

Where  Ig 

(the  sign)  is  bit  2 

and  Iq 

is  bit  10  in  the  in- 

struction  and  I 

is  a 2's  complement 

number 

• 

The  IH  instruction  is  used  as  follows: 

"l5 

Si4  . . . S^Sg  . . 

• 

Where  1^^^  (the  sign) 

Il4  . . . 1^0  . . 

. 0 

is  bit  2 and  I^  is  bit 

— 

10  in  the  instruction 

^15 

2. 2. 3. 2 

°14  • * • 

Hardware 

^0 

and  I is  a 2’s  comple- 
ment number. 

There  are  32  source  and  destination  registers  con- 
tained in  two  16  word  RAM's.  Each  of  the  4 memories  has  8 re- 
gisters assigned  to  them.  The  2 bit  counter  output  is  used  with 
the  3 bit  source  and  destination  fields  to  partition  the  32  reg- 
isters into  4 sections  as  follows: 


Qb 

Qa 

RAM  Addresses 

Selected 

MEMORY 

0 

0 

UNIT  1 

0 

- 7 

DATA  MEM  1 

0 

1 

UNIT  1 

8 

- 15 

DATA  MEM  2 

1 

0 

UNIT  2 

0 

- 7 

COEFFICIENT  MEM 

1 

1 

UNIT  2 

8 

- 15 

BULK  MEM 

I 


I 

1 


1 

I 

i 

l: 


When  the  FIFO  is  empty,  a low  signal  is  sent  to  the 
ADGN  clock  logic.  This  is  appropriately  used  to  inhibit  the 
clocks  used  in  the  ADGN. 

2 . 2 . 3 . 3 Timing 

Two  designs  were  done.  One  used  a discrete  RAM-ALU 
configuration  (Figure  15)  while  the  other  used  the  AMD  2901  mi- 
croprocessor (Figure  16) . The  worst  case  timing  path  for  each 
design  is  as  follows. 

RAM-ALU : 

FIFO  access  + 1KX4  RAM  access  + 2 to  1 select  + 
counter  RAM  access  + look  ahead  add  + 

2 to  1 select  + counter  RAM  data  setup 
= 30+50+12+23+20+12+30 
= 177  ns 


2901; 

FIFO  access  + lKx4  RAM  access  + 4 to  1 select  + 
look  ahead  add  + 2 to  1 select  + counter  RAM  data 
setup . 

= 30+50+20+138+12+30 
= 280  ns 

However,  if  a latch  is  put  on  the  lKx4  RAM,  and  phased  clocks 
are  used,  then  the  FIFO  and  RAM  access  times  become  imbedded 
within  the  machine  cycle  time. 

Hence,  the  cycle  times  are  reduced  as  follows: 

RAM-ALU : 

= 177-30-50 
= 97  ns 

2901: 


1 


I 


280-30-50 
200  ns 
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Figure  15  - SP  Address  Generator  RAM-ALU  Configuration 


2.2. 3. 4 


Comparison 


The  two 

designs  are 

now  compared 

with  respect  to 

totals , 

power,  package  area,  and 

pin  count. 

DESIGN 

PARTS 

POWER 

PINS 

AREA 

RAM-ALU 

1-  54163 

150  » 150 

14  = 14 

.9  X 8 ■=  7.2 

8-54LS253 

50  « 400 

16  > 128 

.8  X 4 = 3,2 

8-29705 

550  c 4400 

28  ° 224 

.24  xl5  =•  3.6 

4-54S181 

700  ■=  2800 

24  •=  96 

1-54S182 

350  = 350 

16  = 16 

1-54S158 

250  « 250 

16  = 16 

4-SSI 

100  400 

14  ° 56 

5TTc 

6.8  W 

550  PINS  14.0  in 

DESIGN 

PARTS 

POWER 

PINS 

AREA 

2901 

1-54163 

150  = 150 

14  = 14 

1.2  X 8 = 9.6 

8-54LS153 

35  = 280 

16  = 128 

.24  xl6  3.8 

8-2901 

1000  «=  8000 

40  - 320 

2-2902 

350  = 700 

16  - 32 

1-54S158 

250  = 250 

16  - 16 

4-SSI 

100  - 400 

14  » 56 

24  IC 

9.8  W 

566  PINS 

13.4  iri^ 

As  seen  from  the  two  designs  presented,  the  2901  does 
not  lose  very  much  to  the  discrete  design  since  8 discrete  RAM's 
and  4 adders  are  needed  compared  to  8 2901 's.  However,  Signetics 
has  announced  a two  port  32  X 4 RAM.  This  will  reduce  the  dis- 
crete design  by  4 IC's  and  about  100  pins.  Further,  the  eight  4 
to  1 selectors  in  each  design  could  be  replaced  by  a 300  gate 
array.  If  the  16  bit  address  buffer  outputs  a tri-state  signal, 
and  if  the  12  bit  quantity/increment  field  is  a tri-state,  then 
these  two  signals  are  tied  together  to  produce  one  16  bit  input 
to  the  300  gate  array.  Bits  0 and  1 from  the  ADGN  control  RAM 
would  be  used  to  control  the  tri-state  output  enables  for  the  two 
signals  and  would  also  be  inputs  to  the  300  gate  array.  This 
would  reduce  each  design  by  128-40  = 88  pins.  Hence,  with  all 
improvements  the  RAM-ALU  design  would  only  use  362  pins  while 
the  design  would  have  478  pins.  The  bigger  RAM's  for  the  RAM- 

ALU  design  would  also  reduce  the  power  by  about  2 watts  and  the 

2 

area  by  about  3 in  . 
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2 . 3 Memory 

Memory  sizes  are  directly  driven  by  mission  parameters, 
requiring  memory  to  be  a modularly  expandable  item.  The  memory 
shapes  are  determined  by  the  degree  of  algorithm  breakdown  em- 
ployed. We  postulate  that  a minimum  of  two  separately  addressed 
memory  units  are  needed  to  keep  a macro-definable  arithmetic  pipe- 
line busy.  Each  of  those  memories  can  have  data  read  from  it  and 
other  data  written  into  it  every  macro  cycle,  with  no  restrictions 
on  allowed  sequences  of  read  and  write  addresses. 

Data  memory  must  supply  inputs  to  and  take  outputs  from 
the  arithmetic  unit  at  the  latter's  fastest  operating  rate.  We 
postulate  that  each  macro  operation  shall  use  no  more  than  the 
equivalent  of  two  complex  data  words  for  input  and  two  for  output. 
Hence,  the  memory  may  perform  two  reads  and  two  writes  at  distinct 
addresses  during  each  macro  execution  time.  The  ratio  between 
the  macro  execution  times  and  the  memory  cycle  times  will  deter- 
mine the  degree  of  multibucket  memory  parallelism  required.  Using 
a four-cycle  ratio  avoids  any  software  addressing  restrictions, 
but  implies  use  of  fast  memory  ICs  for  the  working  data  space.  A 
two  cycle  ratio  requires  two  separate  data  memory  elements,  but 
lowers  the  speed,  and  hence  cost,  of  memory  IC's. 

The  local  memory  element  is  blocked  out  in  figure  17.  In- 
put and  output  data  paths  keep  data  in  the  same  time  sequence,  al- 
lowing for  an  arbitrary  number  of  arithmetic  pipelines  between 
memory  output  and  input.  The  order  of  information  on  the  data 
paths  is  arranged  to  allow  operation  with  either  of  two  types  of 
memories.  A memory  which  can  read  or  write  in  one  clock  cycle 
can  serve  as  the  address  space  for  both  memory  one  and  memory  two. 
Alternately,  a memory  which  can  read  or  write  in  two  clock  cycles 
can  serve  as  the  address  space  for  one  data  memory  unit,  with  two 
such  memory  units  required  in  parallel, for  system  operation.  This 
allows  a maximum  of  flexibility  in  choosing  speed-density  for  lo- 
cal memories. 
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Fiaurp  17  " Micro  Signal  Processor  Local  Memory  Element 


I 


Within  the  local  memory  element  is  a circuit  that  should 
be  considered  for  implementation  by  LSI  logic/  namely  the  conver- 
sion from  a unidirectional  input  bus  to  a pair  of  bidirectional 
busses  and  from  a pair  of  bidirectional  input  busses  to  a unidir- 
ectional output  bus.  This  is  essentially  a combination  of  four 
latches  (or  registers)  plus  some  tri-state  drivers.  The  number 
of  gates  involved  as  well  as  the  number  of  pins  for  a 4 or  6 bits 
slice  is  nominal.  Such  a design  would  find  significant  use  else- 
where in  the  world  compared  with  today's  available  selection 
of  bus  drivers  and  buffers. 

The  coefficient  memory  element  block,  diagram  is  shown 
in  figure  18.  A combination  of  RAM  and  PPDM  are  provided. 

PROM  provides  normalized  vector  tables,  such  as  used  in  ffTs  & 

Magnitudes,  as  well  as  function  weightings  such  as  Hamming  or 

/ 

Hanning.  The  RAM  provides  task  dependent  parameters  such  as  cor- 
rection tables  for  sensors,  as  well  as  a path  for  data  to  enter 
the  pipeline  from  the  GP  I/O  bus,  RAM  vs  PROM  block  selection  is 
done  by  interpreting  the  address  LSB's  as  block  selectors,  with 
the  next  bits  used  for  table  entry  interpolation,  and  the  remain- 
ing bits  (MSB's)  going  to  memory  addressing. 

For  missions  having  large  storage  needs,  such  as  map 
manipulation,  an  even  slower  bulk  memory  element  is  desired.  We 
postulate  creating  such  a bulk  storage  element  using  denser  and 
slower  type  IC's,  presumably  dynamic  RAM.  The  concept  is  outlined 
in  Figure  19,  but  will  not  be  detailed  in  this  study  due  to 
time  and  budget  limitations.  Among  the  features  are: 

• Both  unidirectional  and  bidirectional  data  busses. 

• Formatting  and  address  modifying  logic  for  packing/un- 
packing several  short  words  into  one  24  bit  format. 

• Refresh  logic  with  smarts  to  avoid  refreshing  locations 
that  have  been  addressed  recently  enough. 

1 ; The  overhead  associated  with  the  above  features  is  tolerable  when 


considering  the  drastic  increase  in  memory  packing  density  (a 
factor  of  4)  achieved. 


I 


Figure  19  - //SP  Sulk  Memory  Element 

Another  area  where  significant  improvement  is  possible 
is  in  the  combination  of  RAM  and  PROM  with  BITE.  In  the  control 
elements  - the  sequencer  and  the  address  generator  - there  must 
be  some  RAM,  but  most  code  should  be  in  PROM.  Similarly,  the 
coefficient  memory  can  be  divided  into  frozen  and  writable  memory. 
Moreover,  all  three  cases  would  like  to  allow  the  path  from  the 
GP  I/O  bus  to  write  and  read  such  data  for  BITE  purposes.  Com- 
bining RAM  and  PROM  is  now  occuring  in  the  slower  MOS  ^P  world, 
and  should  be  considered  now  for  the  higher  speed  /xSP  environmemt 
(see  figure  20)  . 


i 
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Figure  20  - RAM/PROM/BITE  Chip 

2.4  Arithmetic  Elements 

The  micro  signal  processor  arithmetic  element  design  is 
a compromise  between  small  size  and  large  thruput  per  pass. 
Multiplier  and  add/subtractor  ratios  can  be  derived  from  task 
analysis.  Other  function  capability  should  be  included  in  hard- 
ware where  computation  alternatives  are  too  slow  and  ICs  in- 
volve are  few,  such  as  leading  zero  detection  and  scaling.  To 
correct  a common  weakness  of  GP  computers,  enough  registers  must 
be  included  for  minimizing  storage  overhead  for  temporary  results, 

Organization  of  registers  and  arithmetics  is  key  to 
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achieving  thurput  efficiency  with  an  easily  programmed  design. 

Our  mini  GPSP  experience  advocates  a compromise  between  the  struc- 
tured simplicity  of  pipelining  and  the  unstructured  flexibility 
of  microprocessor  CPUs.  We  propose  each  pipeline  stage  to  have 
an  arithmetic  function  connected  to  a multiport  register  file. 

Such  a pipeline  can  execute  a macro  having  P times  as  many  opera- 
tions as  hardware  by  performing  P calculations  at  each  stage  on 
its  data  block  before  passing  the  results  down  the  line. 

Data  will  then  enter  the  arithmetic  pipeline  unit  at  the 
same  time  as  the  associated  command,  with  timing  of  macro  to  mi- 
cro control  bit  conversion  to  match  data  flow  speed.  We  postulate 
that  the  required  phased  control  decoder  can  be  implemented  with 
PROM  once  the  macro  command  set  is  defined,  just  as  currently 
done  for  GP  computers,  with  room  for  future  macro  expansion.  The 
macro  commands  associated  with  data  leaving  the  pipeline  are 
similarly  generated  to  facilitate  multiple  AU  configurations. 

Control  of  the  arithmetic  pipeline  is  distributed  be- 
tween the  sequencer,  where  a macro  command  is  generated,  and  the 
pipleine  stages,  where  a macro  acts.  One  special  macro  is  re- 
served, called  SAME,  to  indicate  that  the  same  macro  executed 
during  the  last  macro  cycle  should  continue.  The  macro  command 
itself  is  distributed  through  the  pipeline  as  a serial  bit  stream, 
to  minimize  pipeline  connections,  and  to  allow  chaining  of  pipe- 
line stages.  Recommended,  but  not  necessary,  is  the  feeding  of 
the  macro  output  from  the  pipeline  back  to  the  sequencer,  to  pro- 
vide for  BITE  checks.  Figure  21  shows  the  proposed  fiSP  macro 
decoding  logic. 

The  arithmetic  pipeline  configuration  will  consist  of 
three  stages;  scaling,  multiplication,  and  double  adder/accumu- 
lator. The  design  has  been  conceived  to  minimize  data  routing. The 
expected  performance  capability  is  thus  limited  to  a macro  com- 
mand involving  no  more  than  4 multiplies  and  8 additions  for  up- 
ward family  compatibility  reasons,  the  configurations  achievable 
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Figure  21  - (xSP  Macro  Decode  Logic 


with  the  micro  signal  processor  pipeline  should  all  have  a full 
parallel  equivalent.  Efficiency  of  doing  double  precision  cal- 
culations particularly  with  single  precision  coefficients,  is 
also  an  attractive  feature  of  this  AU  design. 

As  shown  in  Figure  22,  the  first  stage  does  scaling, 
and  approximate  vector  angle  determination.  The  scale  factor  can 
be  set  by  counting  the  number  of  leading  zero's  of  a special  data 
word,  which  is  usually  part  of  the  block  floating  mechanism. 

Single  precision  complex  as  well  as  double  length  real  and  double 
length  complex  formats  are  accomodated.  True  floating  point  can 
be  obtained  by  doing  a block  floating  point  over  one  word  vectors, 
although  this  is  inefficient  in  thruput  and  storage.  Figure  23 
lists  a set  of  control  bit  definitions  for  micro  coding  the  scal- 
ing element  operation. 

Within  the  scaling  element  are  two  candidates  for  a 
special  implementation,  a shifter  and  an  angle  estimator  chip. 

The  commercial  IC  world  has  started  to  recognize  the  need  for  the 
former,  as  witness  Motorolla's  plan  for  an  ECL  shift  barrel.  The 
angle  estimator  is  a key  part  of  our  solution  to  the  magnitude 
question.  Rather  than  rely  on  the  tradidional  "larger  plus  half 
smaller"  or  variations,  an  approximation  scheme,  now  being  paten- 
ted, can  reduce  the  error  from  +6%  to  below  +1.4%.  This  stops 
the  trade  of  system  performance  to  save  a few  gates.  Further- 
more, more  accurate  magnitudes  are  possible  by  iterative  proces- 
sing. Finally  this  angle  estimator  chip  simplifies  monopulse 
computations . 

The  second  pipeline  element  is  the  multiplying  unit  shown 
in  Figure  24.  This  element  does  12  by  12  multiplications  with 
possibility  of  saving  and  working  with  all  24  result  bits  if 
desired.  Data  ordering  for  the  third  stage  is  actually  performed 
within  this  second  stage,  in  the  order  of  multiplication.  One  of 
the  two  inputs  to  this  element  comes  from  the  first  staae.  while 
the  other  comes  from  the  coefficient  memory.  Because  of  the 
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flexible  routing  empolyed,  the  coefficient  memory  path  can  be  used 
as  a means  of  introducing  fixed  constants  and  even  addresses  into 
the  data  stream.  Figure  25  lists  the  control  bits  which  must  be 
specified  for  each  clock  cycle  of  operation. 

A data  route/delay  function  has  been  outlined  in  Figure 
24.  Such  a chip  design  has  usefulness  within  the  /iSP  in  the  mult- 
iplier, the  adder  and  the  address  generator  elements.  Delay 
sizes  do  not  exceed  32  and  are  typically  in  the  8-16  range.  This 
chip  type  is  thus  not  intended  to  replace  RAM,  as  done  in  RCA's 
CMOS  FFT  chip  set.  We  see  the  real  problem  as  combining  the 
switching  of  input  and  output  paths  efficiently  in  the  presence 
of  read  while  write  RAM. 

The  third  pipeline  element  contains  two  ALU's,  accumula- 
tor registers,  and  a peak  detector  as  shown  in  Figure  26.  This 
stage  does  the  additions  required  to  complete  a complex  multipli- 
cation as  well  as  the  "butterfly"  type  combinations.  Both  single 
and  double  length  calculations  are  possible.  A listing  of  possi- 
ble adder  element  micro  control  bits  are  shown  in  Figure  27. 

Implementation  of  the  arithmetic  pipeline  doesn't  fit  the 
2901  type  RALU  design  very  well.  The  quantity  of  working  regis- 
ters needed  is  typically  4 rather  than  16,  and  only  a few  arith- 
metic functions  are  required.  The  only  available  IC's  for  imple- 
menting the  scaling  requirement  are  the  low  density  a priority  en- 
coders and  4 bit  shifters,  and  perhaps  a multiplier  used  wastefully 
as  a shifter.  Hence,  fertile  ground  exists  for  IC  type  im- 
provement in  this  area. 

The  above  design  for  the  multiplier  and  adder  element  has 
favored  functional  simplicity  at  a cost  of  extra  micro  control  com- 
plexity. A viable  alternative  is  to  combine  a long  adder  with 
the  multiplier  element  to  finish  the  complex  or  double  length 
multiplications  there.  Such  an  alternative  has  been  used  before 
at  Raytheon,  and  reduces  the  flexibility  required  in  the  final 
pair  of  adders.  However,  we  opt  here  for  having  a more 
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Figure  24  - Multiplying  Element 


Figure  25  - Multiplying  Element  Macro  Bits 
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Lgure  26  - ^SP  Adder  Element 


BITS  CHANGEABLE  EVERY  CLOCK 


Figure  27  - Adder  Element  Micro  Bits 


flexible  final  stage  in  order  to  accomodate  the  strong  tendency 
seen  to  specialize  programmable  signal  processors  by  adding  an 
extra  CFAR-type  post-processor  into  the  arithmetic  pipeline. 

We  expect  to  be  able  to  accomodate  many  members  of  the  "CFAR  of 
the  month"  algorithm  club  with  the  three  element  pipeline.  For 
those  cases  where  significant  improvement  is  possible  by  giving 
in,  then  we  need  add  only  another  identical  adder  element. 

2 . 5 Arithmetic  Macros 

From  the  analysis  which  is  presented  in  section  3,  we 
have  identified  the  following  list  of  basic  signal  processing 
macros ; 

Vector  normalize 
Vector  scale  and  add 
Sum  of  vector  elements 
Vector  dot  product 
Peak  detect 

Complex  vector  multiply 

Accurate  magnitude 

Correlation 

Complex  FFT 

FIR 

HR 

Sliding  window  sum 

From  the  above  list,  most  of  the  macros  can  be  derived 
in  a very  straight  forward  manner,  because  the  functions  fall  ob- 
viously out  of  the  element  definitions.  For  example,  accurate 
magnitude  happens  by  a complex  word  generating  an  angle  in  the 
scaling  element,  which  pulls  out  the  appropriate  unit  vector  from 
the  coefficient  memory,  which  rotates  the  complex  word  through 
multiplication  and  addition. 

The  operation  of  the  FFT  is  a little  more  complicated  and 
is  thus  shown  in  Figure  28.  Note  that  the  adder  pair  is  used 
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wastefully,  doing  8 adds  when  only  6 are  required.  This  ineffi- 
ciency is  is  illusionary  because  during  the  4 clock  cycles 
2 adds  would  otherwise  be  unused.  Extension  of  this  con- 
cept to  definition  of  the  set  of  three  macros  required  for  double 
length  (24  + j 24  bits)  word  FFT's,  is  shown  in  Figure  29.  Note 
that  the  locations  of  the  data  in  and  date  out  are  properly  lined 
up  to  allow  only  two  local  memory  addresses  per  macro  cycle,  even 
with  double  length  data. 

The  HR  type  filter  is  shown  partitioned  in  Figure  30  into 
the  three  macro's  described  in  Figure  31.  The  first  macro 
beats  the  input  point  with  the  desired  oscillator,  and  leaves  the 
result  in  the  accumulators.  The  local  memory  is  used  to  store 
all  filter  residues.  Thus  a pair  of  zeros  are  computed  by  com- 
bining two  delay  element  outputs  with  the  accumulator  hold  over. 
Similarly,  the  macro  for  the  pole  pair  ends  up  storing  two  com- 
plex numbers.  In  this  fashion  the  number  of  filter  poles  and 
zeros  is  not  limited  by  the  amount  of  storage  contained  within  the 
arithmetic  pipeline. 

Partitioning  of  the  CFAR  algorithm  for  the  uSP  arithmetic 
pipeline  is  shown  in  Figure  32.  Two  macros  are  involved. 

The  first  forms  a magnitude  and  a sliding  window  accumulation. 

The  second  macro  does  the  actual  scaling  of  threshold  value  and 
comparison  against  the  target  value.  Use  of  the  adders  for  con- 
dition testing  can  then  tie  into  saving  the  addresses  of  the 
selected  bins  in  the  address  generator. 

2 . 6 System  Considerations 

A simple  connection  of  ^SP  elements  into  a system  with  one 
of  each  type  element  has  already  been  presented.  This  section 
considers  variations  on  the  interconnection  possibilities. 

Simple  paralleling  or  pipelining  of  elements  is  always 
possible,  such  as  shown  in  Figure  33.  Conversely,  selected 
elements  can  be  employed  to  make  a compact  hard-wired  function, 
such  as  the  FFT  unit  shown  in  Figure  34. 
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Figure  30  - HR  Filter  Partitioning 


MULTIPLIER 


Figure  31  - HR  Filter  Macros  (3  Types) 


Figure  32  - CFAR  Algorithm  Partitioning 


More  complex  netting  of  systems  is  also  possible/  but  this  de- 
serves more  than  the  usual  lip  service.  Particular  consideration 
should  be  given  to  the  problem  of  routing  all  the  high  speed  data 
paths  in  a netted  system  of  parallel  and  pipelined  elements.  For 
this  , we  propose  the  netted  system  router  box  depicted  in  Figure 
35.  Serial  Transmission  is  employed  between  stages  to  minimize 
the  number  of  wires  entering  each  identical  unit,  while  still 
allowing  arbitrary  interconnections  for  fault  tolerant  operation. 
This  concept  of  a corner-turning  memory  has  equivalent  designs 
for  both  analog  and  digital  switching  interfaces. 


Figure  33  - System  Example  of  Micro  Signal  Processor 
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Figure  35  - Netted  System  Router  Box 
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At  some  point,  netting  small  /xSP ' s should  give  way  to  use 
of  a more  powerful  unit,  here  called  a MAXI  SIGNAL  PROCESSOR. 


We  postulate  that  there  should  be,  for  the  same  technology  im- 
plementation, a factor  of  four  difference  in  thruput  and  size  be- 


tween these  two  units.  Certain  compatibilities  can  be  preserved 
between  the  two,  leading  to  the  concept  of  a family  of  program- 


mable signal  processors.  The  following  lists  some  of  the  ways  to 
grow  from  the  yxSP  design  to  this  Maxi  fj.SP . 


SEQUENCER:  * 

* 

ADDRESS  GENERATOR  * 
DATA  MEMORIES  * 

* 

ARITHMETICS  * 

* 

* 

•k 

SPECIAL  FEATURES  * 


DOUBLE-BUCKET  ON  TWO  UNITS 
LOOK-AHEAD  ON  COUNTERS 
PARALLEL  TWO  OR  FOUR  UNITS 
SEPARATE  MEMORY  ONE  AND  TWO 
DOUBLE  BUCKET  EACH  FOR  READ  WHILE 
WRITE 

PARALLEL  FOR  COMMON  OPERATIONS  ON 
DIFFERENT  DATA 

SERIES  FOR  MORE  COMPLEX  MACROS 
INTERLEAVE  FOR  HIGHER  THRUPUT 
PUT  IC’S  INTO  SINGLE  PHASE  VERSION 
CONFIGURE  AU  STAGES  AS  NEEDED  FOR 
HARDWIRED  FUNCTIONS 
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SECTION  III 

TECHNICAL  DISCUSSION 


3 . 1 Problem  Statement 

A typical  avionics  processing  system  is  shown  in  Figure  36 
to  illustrate  the  signal  processing  tasks  to  be  discussed.  The 
signal  processor  input  is  a video  or  IF  signal  from  the  receivers. 

Its  output  is  the  processed  and  reduced  data  in  digital  form  suit- 
ably formatted.  This  output  is  transferred  to  a control  computer 
or  display. 

The  signal  processing  task  is,  in  brief,  to  flow  a set  of  j 

data  through  a sequence  of  filters  one  after  the  other.  The  ‘ 

obiective  is  to  tag  the  location  of  a handful  of  target  data  ^ 

points  in  the  vast  quantity  of  noise  and/or  clutter.  A typical  | 

filter  algorithm  is  a sequence  of  operations  consisting  of  memory 
storage,  arithmetic  operations  (such  as  addition,  subtraction,  j 

multiplication)  and  some  data  switching.  1 


VIDEO  OR  DIGITAI. 


CONTROL 

Figure  36-  signal  Processor  in  Typical  System 
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Note  also  that  Figure  36  does  not  distinguish  between  those 
signal  processing  tasks  which  are  performed  by  analog  hardware 
and  those  done  digitally.  The  exact  boundary  is  a function  of 
the  performance  desired  for  the  price  and  the  state  of  technology, 
rather  than  any  fundamental  limitations.  That  boundary  is  con- 
stantly changing  in  favor  of  more  digital  processing.  Thus,  an 
all-digital  signal  processing  element  is  developed,  based  on  con- 
sideration of  all  the  required  basic  signal  processing  algorithms. 
Furthermore,  this  study  shall  concentrate  on  programmable  approa- 
ches to  digital  signal  processing,  rather  than  hardwired  (e.g., 
fixed  function)  digital  processing  because  of  the  system  advan- 
tages outlined  in  Table  9. 

TABLE  9 

PROGRAMMABLE  DIGITAL  SIGNAL  PROCESSING  ADVANTAGES 


Topic 

Programmable 

Digital 

Analog 

Hardwired  Digital 

Hardware  Utilization 
on  Multimode  appli- 
cations 

High 

Low 

Low 

Reliability  Improve- 
ment Techniques 

1 

Graceful  degra- 
dation; Frac- 
tional redundancy 

Close  Temper- 
ature Control; 
Duplication 

Duplication 

I Number  of  module 
^ type  s 

Few 

Many 

Many 

1 Ease  of  accommoda- 
tion of  new  modes 

Software  change 

New  Module 
designs 

More  Modules 

1 

1 Ease  of  parameter 
i change s 

t 

Software  changes 

Component 
changes  at 
lea  st 

At  least  system 
wiring  changes 

[Cost  of  dynamic 

1 range  considerations 

j 

Automatic  scaling 
is  incorporated 
with  no  losses 

Device  preci- 
sion determines 
cost  and  accu- 
racy 

Word  length  and 
position  varies 
from  one  function 
box  to  another 

1 Efficient  operating 
j m jdes 

Batch  or  real- 
time or  delayed 

Real-time 

Real-time  or 
delayed 
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with  a general  purpose  (GP)  computer,  data  points  are  fed 
into  the  main  memory  either  once  before  calculations  or  contin- 
ually on  an  interrupt  basis.  A single  limited-capability  arith- 
metic section  performs  all  of  the  required  transformations  under 
program  control.  Each  arithmetic  calculation  (i.e.,  add  or  sub- 
tract) requires  a separate  instruction ..  A sequence  of  instruc- 
tions for  a given  transformation  forms  a subroutine,  and  a part- 
icular sequence  of  subroutines  corresponds  to  one  processing  mode. 
No  specialized  hardware  design  is  required. 

Use  of  GP  architecture  does  require,  however,  that  sufficient 
computation  time  be  available.  Data  sets  must  arrive  with  enough 
time  between  successive  sets  to  perform  all  the  required  calcu- 
lations. Alternatively,  the  average  time  between  arrival  of  data 
points  on  an  interrupt  scheme  must  be  enough  to  do  an  appreciable 
amount  of  computation.  Problems  occur  if  data  arrives  at  higher 
rates.  A typical  GP  computer  architecture  becomes  limited  on  any 
or  all  of  the  following: 

• The  input/output  interrupts  reduce  available  time  for  real 
computation 

• The  time  to  set  up  each  calculation  and  do  bookkeeping 
further  reduces  actual  computation  time 

• The  quantity  of  arithmetic  operations  required  may  just 
exceed  the  maximum  possible  capability  of  the  arithmetic 
hardware 

• Parallel  processing  adds  arithmetic  capability,  but  also 
adds  much  overhead  for  coordination  and  system  allocation 

Signal  processina  thruput  requirements  are  higher  than  those 
obtainable  by  GP  approaches.  Using  today's  LSI  technology,  a 
16  bit  minicomputer  with  overlaped  operation  can  give  add  times 
of  0.25  nsec  and  multiply  times  of  5 /jsec  for  a thruput  of  700  to 
1200  KOPS  (thousands  of  mixed  operations  per  second) . Speed-up 
appendages  such  as  a hardware  multiplier  and  shift  barrel  can 
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further  extend  this  GP  KOP  rate  two  or  three  times  more.  How- 
ever i thruputs  at  least  an  order  of  magnitude  higher  require 
exploiting  the  restrictive  nature  of  signal  processing  tasks,  such 
as  the  regular  repetition  of  identifiable  macro  arithmetic  opera- 
tions. 

3.2  Key  Principles  emd  Techniques 

The  design  of  the  micro  signal  processor  elements  (ftSP)  will 
be  derived  by  combining  top-down  analysis  of  avionic  processing 
requirements  with  probable  future  technology  directions,  and  re- 
levant computer  architecture  concepts.  Out  experience  on  similar 
efforts  indicates  that  a workable  degree  of  linear  separation 
exists  between  these  investigations.  Each  such  investigation 
adds  restrictions  on  the  range  of  viable  alternatives  open  for 
the  design  details  of  the  /iSP,  leaving  the  outline  of  tne  advo- 
cated approach. 

Some  initial  constraints  can  be  placed  on  the  fj.SP  by  a quick 
view  of  signal  processor  application  boundaries  over  ground,  air- 
borne and  missile  applications.  Table  10  summarizes  the  normal 
signal  processing  categories  seen  at  Raytheon  for  recent  real  and 
proposed  applications.  Thruput,  as  measured  in  terms  of  real 
multiplies  per  second,  varies  from  0.3  MHz  to  60  MHz  rates. 

Storage  needs  vary  from  a few  thousand  bits  to  a few  million  bits. 
Consequently,  signal  processor  sizes  vary  from  thirty  chips  to 
ten-thousand  chips,  with  power  consumption  ranging  accordingly, 
Satisfying  this  three  orders  of  magnitude  variation  cannot  be 
done  effectively  with  one  design.  Hence,  we  shall  eliminate 
the  applications  on  the  large  end  as  unsuitable  for  defining  a 
pSP.  Also  the  smallest  missile  tasks  will  not  be  considered 
since  they  end  up  as  extremely  specialized  implementations  (at 
least  today).  Remaining  are  the  small  ground  applications, 
small  to  medium  airborne  applications,  and  medium  to  large  missile 
applications.  RPV  applications  also  are  suitable  for  this  /aSP, 

being  equivalent  to  a large  missile  in  computation  requirements. 
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APPLICATION  CONSTRAINTS 


U se 

T hroughput 
(Multiplie  6 /sec  1 
<MHz) 

Memory 
(Kilobits ) 

1C  Quantity 
(K) 

Power 

(W) 

Ca^'d  Area 
(in.  2) 

Packaging  Notes 

60 

2,  000 

10 

5,  000 

lA 

00 

O 

Almost  any  scheme 

Ground 

6 

200 

0.  7 

200 

will  do 

10 

600 

1 

900 

Atr  Diruensions, 

Ai  rborne 

2 

30 

0.  2 

500 

Cooling  problems, 
Minimum  cable  quantity 

60 

300 

100 

i20 

Battery  and  cooling 

Missile 

0.  3 

3 

30 

10 

problems,  special  card 
shapes,  serial  I/O 
es  sential 

Micro  Signai  Processor  Application  Objectives: 


• Groxmd  • Small  Sizes 

• Airborne  - Small  to  Medium  Sizes 

• Missile  - Medium  to  Large  Sizes 


signal  processing  involves  the  high  speed  manipulation  of 
sensor  data  to  extract  the  significant  information  from  the  back- 
ground. Processing  also  depends  on  the  end  use  of  that  infor- 
mation . 


• Radar  signals  - are  detected  against  a white  noise  back- 
ground by  matched  filtering,  to  enhance  signal-to-noise 
(S/N)  ratio  and  remove  clutter  masking  targets 

• Image  processing  - improves  image  quality  by  contrast 
enhancement,  edge  enhancement,  etc,  or  extracts  promin- 
ent features  from  the  background  to  simplify  transmission 
or  subsequent  calculations  like  correlation  or  tracking 

• ^ - includes  sorting  of  video  pulse  descriptions  by  ar- 
rival angles  and  center  frequency  unitl  pulse  repetition 
intervals  can  be  accurately  determined 

• Communications  - includes  speech  compression  and  video 
band  width  compression  in  preparation  for  secure  or  jam- 
resistant  transmission  or  compressed  data  storage. 
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Our  approach  to  the  analytic  aspects  of  designing  these  pro- 
grammable signal  processor  thus  proceed  top-down  as  follows: 

Mission  - Analysis  of  system  modes  and  their  constit- 
uent algorithms  bounds  thruput,  storage,  and 
algorithm  variety  needs 

Function  - Analysis  of  the  required  algorithms  and 
alternative  computation  approaches  bounds  ratios 
of  calculation  components,  word  formats,  interconnec- 
tions 

Environment  and  Technology  - Analysis  of  a particular 
application  environment  and  a technology  snapshot 
constraints  logic  type,  packaging  and  many  processor 
features 

Arthitecture  - From  analysis  of  competitive  architec- 
tures, firmware  tradeoffs,  and  past  experience,  mod- 
ularity directions  are  defined  and  a design  detailed 

Candidate  Fit  - Trying  the  postulated  design  against 
the  evolving  system  modes  tells  the  size,  thru- 
put,  ease  of  use  and  other  measures  of  design  cost/ 
ef fectivity . 

Note  that  mission  analysis  is  listed  before  function  analysis. 
Some  basic  function  analysis  can  occur  independent  of  the  mission 
analysis.  However,  determination  of  arithmetic  thruput  rates, 
ratios,  and  accuracy  depends  on  first  defining  example  missions 
to  ensure  algorithm  exhaustiveness  and  to  explore  alternative 
orderings  and  computation  schemes.  Out  experience  is  that  there 
is  a factor  of  two  in  performance  to  be  obtained  by  careful  task 
analysis  and  maniputation  to  exploit  the  programmability  of  dig- 
ital SPs.  Hence  performance  analysis  started  under  this  study 

at  the  same  time  as  function  analysis. 
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3 . 3 Performance  Analysis 


A signal  processor  has  several  levels  of  programmability. 

At  the  highest  level  are  the  modes  of  a system  mission  which 
enhance  detectability  of  different  kinds  of  targets  in  various 
environments.  These  are  composed  of  processing  algorithms  such 
as  described  in  the  preceding  subsection.  The  latter  are,  in 
turn  formed  by  a number  of  passes  through  the  arithmetic  unit, 
with  each  pass  being  a macro  instruction  execution.  Macros  are 
created  by  combining  the  micro  command  bit  fields  of  the  adders, 
multipliers,  scalers,  working  registers,  and  routing. 

In  this  subsection  several  different  signal  processing 
missions  are  examined  in  general  and  in  detail.  Examples  include: 
synthetic  aperture  strip  map,  A/A  search  and  track,  fast  and  slow 
ground  moving  target  indication,  RF  signal  sorting  and  class- 
ification, and  voice  coding. 

3.3.1  Strip  Map  - General 

In  strip  mapping,  a large  area  map  is  synthesized 
from  a set  of  smaller  maps,  each  of  which  has  a relatively  small 
number  or  range  cells.  The  system  flow  diagram  is  shown  in  Fig- 
ure 37.  Complex  data  from  A/D  converters  first  goes  through  a 
Barker  code  pulse  compression  and  then  motion  compensation.  Phase 
shifting  of  the  data  with  complex  multiplications  removes  the  fre- 
quency off-set  due  to  antenna  squint  and  frequency  changes  due 
to  geometric  distance  from  map  center.  Next,  a weighted  sum  of 
several  PRF  samples,  formed  for  each  mapped  range  interval  by  a 
low-pass  FIR  figter,  is  buffered  and  corner  turned  to  gather  all 
samples  from  the  same  range  cell  on  the  ground. 

Range  data  is  then  spectrum  analyzed,  with  weighting 
applied  before  the  FFT  to  reduce  ^ sidelobes.  After  FFT  cal- 

culations, doppler  cells  at  the  ends  of  the  spectrum  are  dropped. 
Magnitude  and  integration  of  the  data  yields  a map  for  display. 
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3.3.2  A/A  Search  and  Track  - General 


The  A/A  search  and  track  system  chosen  uses  low  PRF 
waveforms  when  looking  above  the  horizon  and  high  PRF  waveforms 
when  looking  below  the  horizon.  In  both  modes,  the  waveforms  are 
processed  coherently.  Combinations  of  PRF  variation  and  multiple 
looks  provide  for  resolution  of  range  and  doppler  ambiguities. 

We  will  explain  a high  PRF  operation  with  eight  range  gates. 

Track  waveforms  are  the  same  as  the  search  waveforms  but  with 
two  of  the  range  cells  filled  by  data  from  the  monopulse  differ- 
ence channels. 

The  flow  chart  of  the  system  is  shown  in  Figure  38. 
Pulse  compression  has  not  been  included  in  this  system.  First, 
correcting  the  signal  for  I/Q  unbalance  in  the  quadrature  demod- 
ulator permits  discrimination  of  the  target  from  images  in  the 
doppler  region  representing  large  amounts  of  ground  clutter. 
Correction  factors  are  determined  by  pilot  pulse  measurements. 


to  DISPLAY 


Figure  37  - SAR  Strip  Mapping  Flow 
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Figure  38  - A/ A Search  and  Track  Flow 
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Then  motion  compensation  by  phase  shifting  the  signal  occurs, 
to  position  the  ground  clutter  for  removal  by  an  FIR  or  an  IIR 
roughing  filter.  The  data  is  then  buffered  until  all  points 
have  been  acquired  for  a spectrum  analysis.  The  spectrum  an- 
alysis uses  time  weighting  at  the  FFT  input  to  reduce  the 
sidelobes. 

X 

Subsequent  operations  depend  on  the  operating  mode.  In 
the  search  mode,  the  complex  FFT  outputs  are  magnituded,  followed 
by  CFAR  detection.  In  the  track  mode,  the  antenna  pointing  error 
is  computed  from  the  outputs  of  the  sum  and  difference  channels 
at  the  predicted  target  range  and  doppler.  Target  range  and 
velocity  interpolations  are  also  made. 

3.3.3  Ground  Moving  Target  Indication  - General 

GMTI  systems  are  concerned  with  fast  and  slow  moving 
targets.  We  differentiate  between  these  two  by  noting  that  an 
area  on  the  ground  will  be  illuminated  by  the  main  beam  of  the 
radar.  The  doppler  shift  of  fast  targets  only  competes  against 
sidelobe  returns  of  fixed  reflectors  and  are  easily  detected 
and  tracked  with  roughing  filtering  and  spectrum  analysis . 

Slow  targets  present  the  more  difficult  resolution  problem  be- 
cause their  doppler  shift  competes  with  the  mainlobe  returns  of 
fixed  targets  due  to  aircraft  velocity  and  pointing  angles.  A 
technique  under  development  at  Raytheon  uses  monopulse  sum  and 
difference  ratios  for  all  instrumented  range/doppler  cells.  In 
cells  containing  a slow  target  and  some  ground  clutter  the  real 
and/or  the  imaginary  part  of  the  monopulse  ratio  will  fall  out- 
side its  expected  value  , allowing  detection.  In  Figure  39  the  i 

sum  and  difference  signals  flow  through  identical  processing 

initially.  Motion  compensation  corrects  for  nonuniform  air-  ; 

craft  velocities  and  centers  the  main  doppler  beam  on  zero  1 


frequency.  A roughing  filter  is  applied  to  each  range  cell  in  j 


I 


i! 


INDICATION 


Figure  39  - A Slow  COMTF  Systen 

the  interval  where  the  difference  channel  is  well  behaved. 

After  sampling  rate  reduction,  data  is  buffered,  then  spectrum 
analyzed  with  a weighted  FFT.  Next,  a complex  monopulse  ratio 
is  formed  for  each  of  the  instrumented  range  doppler  cells. 

Then  for  each  doppler  cell  of  the  sum  and  difference  channel, 
an  averaging  over  range  is  done  to  establish  ground  clutter 
statistical  measures.  These  statistical  measures  set  thresholds 
against  which  che  complex  monopulse  ratio  will  be  compared  to 
obtain  target  indications. 

3.3.4  Radar  Air  to  Ground  Processing  - Detail 

A representative  set  of  processing  reg-ii 
postulated  for  an  advanced  air-to-ground  radar  . 

order  to  detail  the  algorithms  required  and  tj;  ^ 
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loads.  Two  major  modes  were  considered; 

1)  Strip  mapping  and  ground  moving  target  detection 
(GMTD) 

2)  Spotlight,  ground  moving  target  track  (GMTT) , and 
missile  retransmission  processing  (RETRAN) . 

These  are  essentially  search/acquisition,  and  target  track/ 
missile  guidance.  For  the  STRIP  MAP  and  GMTD  mode,  basic  pro- 
cessing uses  a presumming  filter  for  bandwidth  reduction  while 
the  second  mode  uses  FFT  roughing  filters.  Also  mode  1 uses 
complementary  Golay  code  pulse  compression  with  side  lobe  im- 
provement by  addition  of  sequential  returns,  while  mode  2 uses 
Golay  code  pulse  compression  by  replica  convolution  only. 

For  mode  1,  the  flow  diagram  is  presented  in  Figure 
40.  An  analysis  of  the  computation  requirements  of  each  of  the 
constituent  functions  is  presented  in  Table  11.  Clearly,  the 
processing  up  through  motion  compensation  dominates  the  total, 
with  Golay  pulse  compression  dominating  most  of  the  loading. 

For  example,  with  typical  PRF  rates,  the  Golay  code  processing 
is  up  in  the  area  of  several  hundred  million  adds  per  second. 
This  is  beyond  the  range  where  the  ;iSP  even  in  netted  systems 
is  appropriate,  leading  to  the  recommendation  that  this  front- 
end  task  be  considered  for  specialized  sub-nanosecond  logic 
implementation.  Conversely,  the  processing  after  the  weighted 
presumming  is  easily  within  the  postulated  capability  of  only 
one  fJiSP . 

The  flow  chart  and  functional  analysis  for  SPOTLIGHT, 
GMTT  and  RETRAN  mode  are  given  in  Figure  41  and  Table  12  re- 
spectively. It  is  assumed  that  missile  returns  for  sum  and 
difference  channels  are  frequency  multiplexed  together.  FFT 
roughing  is  used  to  de-multiplex  the  Fj^  spectrinn.  Again  the 
major  computation  loading  is  around  the  Golay  code  pulse  com- 
pression. The  FFT  roughing  filter  however,  now  represents  a 
significant  computation  load,  say  5 to  10  ;/SP's  worth. 


PRESUM  I WEIGHTED 


Figure  40  - Flow  Chart  Spotlight  GMTT  and  Retran 


FUNCTIONAL  ANALYSIS  SPOTLIGHT  MAP,  GMTT  & RETRAN 


Figure  41-  Flow  Diagram  Strip  Map  and  GMTD 
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All  the  remaining  processing,  including  the  presum,  PPT,  and 
post  processing  generally  can  be  placed  into  one  ^SP. 

For  both  modes  a significant  amount  of  bulk  memory 
is  required,  somewhere  in  the  vicinity  of  2 to  8 million  bits 
of  memory.  When  this  amount  is  made  of  4K  bits  per  chip,  sys- 
tem size  is  dominated  by  memory  modules.  Hence,  the  use  of  16K 
bits  per  chips  is  necessary  for  memory  sizes  to  be  comparable 
to  the  advantages  obtainable  with  LSI  processor  sizes.  Shift 
register  type  memories  severely  restrict  the  choice  and  se- 
quence of  processing  algorithms.  Hence,  we  emphasize  the  choice 
of  random  access  memory,  even  though  this  entails  dynamic  mem- 
ory elements  and  the  problems  of  meshing  refresh  cycles  into 
the  high  thruput  processing  scheme. 

Trade-offs  exist  between  storage  and  processing,  with- 
out degrading  display  performance.  Smaller  presumming  sizes 
require  larger  FFTs  with  selection  of  fewer  outputs  from  the 
FFTs.  Without  taking  sides  on  this  perennial  design  question, 
we  prefer  to  push  the  technological  capability  to  provide  sig- 
nificantly greater  amounts  of  both  storage  and  programmable 
thruput  capability  in  smaller  spaces.  That  is  the  fundamen- 
tal thrust  of  fxSP  development.  « 

3.3.5  Linear  Predictive  Processing 


3. 3. 5.1  Introduction 


This  subsection  details  some  typical  processing 
tasks  involved  in  a communication-type  application.  Adaptive 
predictive  processing  represents  a category  of  computations  which 
can  be  handled  efficiently  within  a micro  signal  processor. 

This  category  is  also  representative  of  higher  thruput  video 
communications  signal  processing,  where  the  primary  objective 
is  a drastic  reduction  in  the  video  bandwidth  necessary  to 
represent  the  sensor  information  before  entering  into  a jam- 
resistant  transmission  process. 
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A growing  interest  can  be  expected  in  the  ap- 
plication of  these  algorithms  to  radar  target  detection  and 
tracking.  A drawback  is  that  the  processing  load  increases 
several  times  over  conventional  frequency  domain  processing  with 
FFT's  and  CPAR's,  but  the  payoff  is  greater  in  the  enhanced 
discrimination  of  closely  space  returns.  Exeimples  are  small 
targets  in  the  presence  of  a large  target  or  jammer,  and  mov- 
ing target  indicators.  Figure  42  summarizes  these  concepts. 

Incidently,  one  of  the  differences  between  the 
various  batch  type  adaptive  discrete  filtering  schemes  is  in 
the  assumptions  made  about  the  data  behavior  outside  the  batch 
interval.  For  example,  the  maximum  likelihood  method  when  used 
to  operate  an  autoregressive  filter  assumes  that  the  data  is 
zero  outside  the  interval.  The  maximum  entropy  method  makes  no 
assumptions  whatsoever.  In  contrast  the  standard  FFT  assumes 
that  data  is  periodic  outside  the  region  given,  with  a funda- 
mental period  equal  to  the  batch  size. 

In  communications,  a Vocoder  represents  one  of 
the  more  fruitful  areas  where  the  Micro  Signal  Processor  could 
be  applied.  At  least  two  applications  of  this  type  have  been 
documented.  Weinsteine  [1]  * described  the  use  of  the  Lincoln 
Labs  Fast  Digital  Processor  as  a Linear  Predictive  Vocoder. 
Goldberg  and  Arcese  [2]  showed  that  adaptive  predictive  encoding 
could  be  done  using  the  Sylvania  Programmable  Signal  Processor. 

These  Vocoders  convert  analog  speech  into  a dig- 
ital representation  for  transmission  on  a communications  channel 
During  processing  they  compress  the  speech  and  reduce  the  bit 
rate  on  the  channel  by  a factor  of  about  10  while  maintaining 
the  speech  quality. 

There  are  two  primary  motivations  for  digitizing 
the  speech  signal.  First  it  qreatlv  simplifies  and  ecomomizes 


* Note:  References  indicated  are  those  found  in  section  3. 3. 5. 6 
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CONTINUOUS: 


UNCLASSIFIED 


BATCH  TYPE:  AUTOREGRESSIVE/MAXIMUM  ENTROPY  FILTERS 


APPLICABLE  TO:  SPEECH  COMPRESSION 

SAR  MAPPING 
FREQUENCY  TRACKING 

YIELDS  IMPROVED  FREQUENCY  RESOLUTION  RELATIVE  TO  BATCH  SIZE 
USED  TO  LOCATE  SPECTRAL  COMPONENTS  OF  COMPLEX  SIGNALS 


KALMAN  FILTERS 


APPLICABLE  TO: 


NAVIGATION 


STABILIZATION 


TRACKING 


USED  TO  ESTIMATE  PARAMETERS  OF  SIGNAL  CORRUPTED  BY  NOISE 
ESTIMATED  PARAMETERS:  FREQUENCY,  RANGE,  ANGLE,  AMPLITUDE 


Figurfi  42“  Adaptive  Discrete  Filtering 

any  repeaters  required  in  the  system.  Second  it  admits  the  use 
of  encrypting  to  obtain  the  advantages  of  a secure  communica- 
tions channel.  In  this  study,  one  form  of  Vocoder  is  postulated 
and  the  processing  load  imposed  by  it  on  the  signal  processor 
is  evaluated.  This  system  follows  the  work  described  by  MAR- 
KEL  [3, 4, 5, 6, 7]. 


3. 3. 5.2  System  Concept 

A Vocoder  system  consists  of:  a)  a voice  dig- 
itizer that  accepts  speech  and  converts  it  to  a compressed 
digital  representation,  b)  a digital  communications  channel  that 
^^^^4es  the  digital  message  from  the  originator  to  a receiver, 
and  c)  a voice  synthesizer  that  synthesizes  a speech  signal 
from  the  digital  representation.  These  elements  are  arranged 


into  a system  as  shown  in  Figure  43. 


SPEECH  VOICE  DIGITAL  SPEECH  SPEECH 


Figure  43  - Vocoder  System 


The  ability  to  reduce  the  bit  rate  required  for 
the  speech  is  based  on  the  model  of  the  vocal  mechanism  shown 
in  Figure  44. 


SPEECH 

OUT 


Figure  44-  Model  of  Vocal  Mechanism 

In  this  model  speech  is  assumed  to  be  either  a 
harmonic  signal  whose  spectrum  is  shaped  by  the  vocal  track 
filter  for  voiced  sounds,  or  white  noise  that  is  shaped  by  the 
vocal  track  filter  for  unvoiced  sounds.  Although  speech  con- 
tains high  frequency  signals,  the  vocal  mechanism  can  only 
modulate  these  signals  at  some  low  rate. 

If  for  this  system  we  can  determine  signal  amp- 
litude, voiced  or  unvoiced,  pitch  if  voiced,  and  about  12  vocal 
track  parameters  as  a function  of  time,  realistic  voice 
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reproduction  can  be  realized  at  the  receiving  station. 


In  the  operation  of  the  system  the  speech  is 

filtered,  sampled^  and  A/D  converted  as  a first  steo  in  oo- 

/ 

eration.  The  speech  samples  are  then  partitioned  into  segments 


about  25  milliseconds  long.  For  each  of  these  segitients,  the 
various  model  parameters  are  extracted,  encoded  eind  transmitted. 

The  receiving  station  is  described  in  Figure  43. 
Here,  a harmonic  generator,  a noise  generator,  and  a voiced/ 
unvoiced  switch  are  controlled  by  the  appropriate  model  para- 
meters. The  resulting  output  is  filtered  by  an  all  pole  fil- 
ter whose  characteristics  are  determined  by  the  vocal  track 
parameters . 

3. 3. 5. 3 Vocal  Track  Filter  Coefficients 

The  vocal  track  filter  can  be  represented  (to  the 
accuracy  required,  as  demonstrated  by  experiments)  by  an  all 
pole  filter.  The  location  of  the  poles,  or  their  equivalent  are 
the  vocal  track  parameters.  Conceptually,  the  pole  locations  can 
be  determined  by  synthesizing  a zeros  only  filter  whose  output 
is  white  noise  when  driven  by  speech  signal.  The  filter  so  syn- 
thesized is  a linear  predictive  or  autoregressive  filter. 

Three  major  steps  are  used  in  extracting  and  pre- 
paring the  vocal  track  parameters  for  transmission.  First  the 
autocorrelation  function  for  each  set  of  input  samples  is  gen- 
erated. This  calculation  can  be  accomplished  using  FFT  proce- 
dures, or  by  an  accumulation  of  products  procedure.  For  the 
size  of  the  sets,  and  the  number  of  autocorrelation  coefficients 
extracted,  the  latter  procedure  is  slightly  more  economical. 

In  either  case,  this  computation  is  the  major  portion  of  the 
computing  load. 

In  the  second  step,  the  coefficients  of  the 
autoregressive  filter  are  determined.  The  autoregressive  filter 
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has  a transfer  function. 


H(z) 


M 


a i 
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The  set  of  coefficients  [a^]  of  this  filter  are 
detemmined  by  an  algorithm  of  the  type  shown  in  Figure  47. 

This  algorithm  is  a simplified  version  of  Robinson’s  algorithm 
[8]  as  developed  by  MARKEL  and  GRAY  [5]. 

The  third  step  is  to  prepare  the  coefficients 
for  transmission.  Fettweis  [9]  showed  that  Digital  wave  filters 
require  fewer  digits  for  the  multiplier  coefficients  than  more 
conventional  structured  HR  filters.  By  transforming  from  the 
filter  coefficients  |a^|  to  wave  filter  parameters  the 

amount  of  data  required  for  transmission  can  be  reduced.  An 
algorithm  for  making  this  transformation  was  described  by  GRAY 
and  MARKEL  ^6] . 

From  a signal  processor  viewpoint,  these  several 
algorithms  consist  primarily  of  real  adds  and  multiplies  with 
an  occasional  divide.  The  algorithms  have  a regularity  that 
make  them  amenable  for  use  in  a signal  processing  structure. 

In  the  Robinson  Algorithm,  Markel  and  Gray  [5]  showed  by  sim- 
ulation that  19  to  22  bits  will  be  required  in  floating  point 
calculations  to  insure  the  stability  of  the  filters. 

The  wave  filters  as  described  by  k^^  are  precisely 
equivalent  to  the  "PARACOR  COEFFICIENT"  derived  by  ITAKURA  and 
SAITO  [101.  Therefore,  that  form  of  voice  processing  to  deter- 
mine vocal  track  parameters  is  included  within  the  signal  pro- 
cessing structure  being  examined. 
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3. 3. 5. 4 Other  Parameters 


The  other  parameters  of  the  system  include:  a 
voiced/unvoiced  decision,  pitch  for  voices  sounds,  and  signal 
level.  The  general  procedure  is  to  process  extensively  to  get 
a good  chance  of  making  a correct  choice  and  then  clecin  up 
the  results  using  a set  of  logical  decisions.  The  processing 
up  to  this  last  point  is  of  the  type  normally  done  in  signal 
processing.  The  logical  processes  will  require  either  a small 
GP  computer,  or  a look-up  table  decision  netword.  The  processing 
steps  which  follow  Markel  and  Gray  [7]  include  the  following; 

1.  Test  the  zero  crossing  rate  of  the  input  samples  at  an 

8 KHz  rate.  If  less  than  2 crossings  per  millisecond  tag 
the  segment  as  unvoiced. 

2.  Low  pass  filter  the  signal  using  a 3 pole  Chebyshev  filter 
with  an  800  Hz  cut  off  frequrncy.  Decimate  the  output 
seimpling  rate  to  2 KHz  to  form  the  test  segment. 

3.  Remove  any  bias  that  is  present  in  the  test  segment. 

4.  Measure  the  power  level  of  the  segment.  Tag  segment  as 
silent  if  a preset  threshold  is  not  exceeded. 

5.  Form  the  first  four  auto  correlation  coefficients  of  the 
bias  free  test  segment.  Use  accumulated  products. 

6.  Use  the  Robinson  Algorithm  to  obtain  a four  coefficient 
auto  regressive  filter. 

7.  Filter  the  bias  free  test  segment  using  the  auto  re- 
gressing filter. 

8.  Form  the  autocorrelation  function  of  the  residue  from  the 
auto  regressive  filter.  Use  FFT  procedures, 

9.  Locate  the  peak  of  the  auto  correlation  function.  This 
is  a coarse  indication  of  pitch  period. 

10.  Interpolate  using  the  auto  correlation  peak  and  the  ad- 
jacent two  samples  at  six  intermediate  points. 

11.  Pick  the  maximum  from  the  interpolated  values  as  the  lo- 
cations of  the  pitch  period. 
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12. 


Test  the  maximum  against  a threshold  for  a voiced/unvoiced 
decision. 

13.  Do  logical  tests  to  clean  up. 

3. 3. 5. 5 Processing  Load 

The  total  processing  requirments  thus  imposed 
by  the  algorithms  just  described  are  indicated  in  Figure  45. 
Further  breakout,  of  the  "other"  parameter  processing  is 
presented  in  Figure  46.  Note  that  the  rates  are  more  than  an 
order  of  magintude  below  that  achievable  by  a signal'' #iSP . Hence, 
a number  or  options  are  possible: 

• Multiplex  many  such  channels  through  one  unit 

• Squeeze  this  processing  onto  a ^iSP  for  another  task 

, Consider  using  some  of  these  adaptive  algorithms  for 
higher  thruput  sensors,  namely  radar 

More  detailed  insight  into  the  algorithms  is 

provided  by  Figures  47  and  48.  The  former  shows  the  formulae 
for  the  autoregressive  filter  while  the  latter  shows  the  for- 
mulae for  the  Berg  algorithms  for  maximum  entropy  calculations. 

The  digital  lattice  filter  form  is  part  of  the 
vocoder  scheme  just  described.  It  was  invoked  because  of 
claims  of  minimizing  coefficient  bits.  However,  further  litera- 
ture search  has  revealed  that  the  more  conventional  form  of 
HR  filter,  namely  cascaded  two-pole,  two-zero  sections,  is  as 
good  in  that  property.  Hence,  the  law  of  simplicity  is  invoked, 
and  reliance  on  the  conventional  form  will  be  assumed. 
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Fiaure  45  - Linear  Predictive  Vocoder 


PROCESSING  STEP 

MULTIPLIES 

PER 

FRAME 

ADDS 

PER 

FRAME 

1.  ZERO  CROSSING  TEST 

200 

2.  LOW  PASS  FILTER 

600 

600 

3.  BIAS  REMOVAL 

100 

4.  POWER  LEVEL 

50 

50 

5.  FOUR  AUTOCORRELATION  COEF 

200 

200 

6.  ROBINSON  ALGORITHM 

20 

20 

7,  AUTOREGRESSIVE  FILTER 

200 

200 

8.  AUTOCORRELATION  VIA  FFT 

1900 

3000 

9.  LOCATE  PEAK 

50 

10.  INTERPOLATE 

20 

20 

11.  PEAK  PICK 

12.  THRESHOLD 

MILL 

20 

13.  LOGIC 

MILL 

TOTAL  PER  FRAME 

3290 

4460 

TOTAL  PER  SECOND 

131,600 

178,400 

Figure  46  - Other  Parameters  Processing  Load 
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INPUT:  1.  A set  of  samples  x 


OUTPUT: 


n 


{n=0,l,.  . . n-1) 


2.  The  order  of  the  filter  M 

The  set  of  filter  coefficients 


A)  Compute  autocorrelation  function  of  the  input  data. 

N-1-  I k I 

( ^n  ^n+|k|  ) 


Z 

n=0 


B)  Using  the  initial  conditions 
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Figure  47  -,  Algorithms  for  Autoregressive  Filter 
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Figure  48  - Calculations  For  Maximum  Entropy  Filter* 


* Reference:  N.  Andersen,  "On  The  Calculation  of  Filter  Coef 
(2)  ficients  For  Maximum  Entropy  Spectral  Analysis, 

Geophysics,  Vol.  39-No. 1 (Feb.  1974),  P.  69-72. 
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3.3.6  RF  Signal  Sorting  and  Classification 

This  -last  example  is  from  ECM  systems,  and  explores 
the  need  for  high  speed  signal  sorting  rather  than  number  crunch- 
ing. The  purpose  is  to  determine  the  PRI  and  PRI  type  (constant, 
staggered,...)  of  hundreds  of  simultaneous  emitters.  The 
system  block  diagram  is  just  Figure  3-1  without  a transmitter. 

An  omnidirectional  antenna  feeds  a receiver  where  the  pulse 
frequency  (F) , angle  of  arrival  (AOA) , and  time  of  arrival  (TOA) 
are  assembled  into  a pusle  descriptive  word  (PDW) . The  signal 
processor  screens  PDWs  into  the  following  emitter  types  based 
on  F,  AOA,  and  previous  PDWs: 

• New  emitters  are  stored  with  an  activity  count 
of  1 

• Old  ones  have  their  activity  count  incremented 
and  new  TOA  saved 

• N'th  time  ones  have  their  F,  AOA  and  all  TOAs 
go  to  the  data  processor 

• Beyond  N times,  the  PDW  is  ignored 

The  data  processor  then  does  the  actual  PRI  and  PRF  type  cal- 
culations at  msec  rates  compared  with  signal  processor  inputs 
at' ^sec  rates. 

A signal  sorter  implementation  using  content  addres- 
sable memory  is  very  powerful  but  not  recommended  for  this  study. 
These  special  components  lack  the  intensive  commercial  investment 
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activity  of  RAMs. 

The  approach  initially  preferred  here  is  based  on 
hash  coding  of  the  F emd  TOA  part  of  a PDW  to  address  a RAM. 

The  extra  steps  required  for  multiple  hits  of  the  same  address 

mapping  are  minimized  by  intelligent  choice  of  mapping  and  by 
keeping  memory  loading  below  half  capacity. 

This  second  approach  is  preferred  because  it  uses  con- 
ventional RAMs  and  micro  processor  CPU  slices.  However,  the 
computer  which  does  such  sorting  is  still  fundaunentally  different 
than  a number  cruncher.  Fast,  efficient  handling  of  this  tag 
mapping  and  matching  at  first  appeared  to  require  a unique 
building  block.  If  this  were  true,  the  remaining  elements  in 
this  application  could  still  be  common  to  those  filtering  type 

missions.  approach  finally  developed  emphasizes  the  direct 

list  processing  capability  for  the  fiSP.  A feature  is  included  in 
the  address  generation  mechanism  to  allow  a data  word  to  later 
serve  as  an  address  pointer  to  another  data  word,  etc,.  Such 
list  structures  can  branch  or  termin^^te  upon  meeting  appropoiate 
arithmetic  conditions. 

This  ability  to  work  with  pointers  to  data  subsets  has 
applicability  far  beyond  the  ECM  problem.  Signal  post  processing 
is  expected  to  head  in  this  direction  as  more  experience  is 
gained  by  analyiis  with  the  unique  abilities  of  digital  signal 
processes.  Finally,  the  ability  to  expand  and  compress  data 
sets  is  one  of  the  key  useful  functions  to  emerge  from  recent 
examinations  of  the  strengths  of  the  first  generation  of  vector 
computers  such  as  CDC  STAR  and  TI  ASC. 

3.3.7  Processing  Modes 

The  ^SP  is  expected  to  be  capable  of  performing  the 
functions  listed  in  Table  13.  This  table  is  based  on  the  pre- 
vious analysis,  as  well  as  summarizing  Raytheon's  experience 
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over  the  past  six  years  of  applying  progranunable  signal  pro- 
cessors. The  degree  of  efficiency  and  proficiency  at  any  given 
function  can  vary,  depending  upon  the  frequency  of  execution 
required  as  well  as  the  particular  configuration  of  (iSP  elements 
chosen  to  satisfy  that  particular  application.  Further  break- 
down of  these  algorithms  into  more  basic  elements  occurs 
under  the  next  section,  functional  analysis. 

TABLE  13 

EXPECTED  (iSP  ALGORITHMS 

, , . ~ ~ I 

• CORNER  TURNING  OF  DATA 

• FFT,  FFT"^ 

• HETERODYNE 

• BINARY  PHASE  CODE  CORRELATION 

• FIR  - CONTINUOUS 

• FIR  - SUM  AND  DUMP 

• HR 

• MAGNITUDE  AND  INTEGRATE 

• AUTOREGRESSIVE/MAXIMUM/ENTROPY  FILTER 

• CFAR  AND  THRESHOLDING 

• DATA  SORTING 

• PSEUDO  RANDOM  NUMBER  GENERATION 


KALMAN  FILTERING 


3. 4 Functional  Analysis 

Missions  can  be  broken  down  into  their  constituent  algor- 
ithms, such  as  done  in  Figure  49  for  radar  missions.  As  we 
have  seen  in  the  previous  sections,  similar  constituent  algor- 
ithms can  be  found  in  the  other  missions  analyzed.  This  section 
examines  those  fundamental  algorithms  in  more  detail  in  order 
to  derive  design  restrictions. 
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Figure  49  - Functions  Composing  Missions 


3.4.1  A/D  Converter 


Some  key  interface  design  parameters  emerge  from  the 
need  to  convert  analog  sensor  data  into  digital  words.  Con- 
version rates  below  IMHZ  allow  one  small  package  to  produce 
an  8 bit  word,  even  when  all  those  bits  are  not  needed.  Any 
extra  bits  produced  can  reduce  the  need  for  analog  AGC,  through 
adoption  of  some  degree  of  block  scaling  up  to  true  floating 
points . 

Conversion  speeds  in  the  MHz  range  are  sustainable 
with  even  a /iSP  if  sufficient  buffering  is  provided.  A low-duty 
cycle  can  then  be  the  processing  time  limitation  rather  than  the 
peak  input  rate.  Alternately,  if  the  information  bandwidth  is 
small  enough,  buffering  can  make  batch-mode  demodulation  feasible 
on  even  high  duty  cycle  inputs. 

The  fJiSF  should  be  capable  of  easily  processing  complex 
or  real  data.  Complex  data  (in-phase  and  quadrature)  conversion 
is  normally  used  on  radar  systems  because  this  reduces  bandwidth 
requirements.  Image  processing  uses  real  data  initially.  With 
the  exception  of  Fourier  transforms,  the  processing  tasks  that 
follow  in  most  cases  which  use  complex  data  are  equivalent  to 
processing  with  two  real-channel  processors.  Hence,  our  recommen- 
dation that  the  fiSP  be  equally  efficient  with  complex  or  dual 
real  formats. 

The  connections  between  the  converter  and  the  //SP  are 
thus  possible  at  any  of  three  places: 

• an  intermediary  buffer  memory 

• directly  into  the  minimal  configuration 

• via  the  GP  I/O  bus 

Provisions  to  accomodate  data  introduction  and  re- 
moval at  any  or  all  of  those  three  positions  have  been  made  in 
the  proposed  design.  The  first  connection,  via  buffering,  allows 
load-while-process  operations  of  the  simplest  kind.  The  second 
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minimizes  chips  for  cases  where  only  small  amounts  of  input 
buffering  are  needed.  The  third  connection  allows  the  GP  com- 
puter to  share  the  analog  interface  elements  at  a cost  in  pro- 
gramming complexity. 

3.4.2  Pulse  Compression 

Commonly  used  forms  of  pulse  compression  include 
Barker  code  and  linear  FM,  with  frequency  and/or  time  weighting 
controlling  sidelobe  levels.  The  algorithm  used  depends  on  both 
kernel  size  and  signal  processor  flexibility.  For  example, 
linear  chirp  convolvers  are  not  as  efficient  as  Fourier  techniques 
for  sizes  over  50  point  or  so.  Binary  codes  require  fewest  adds/ 
subtracts  when  processor  data  addressing  can  be  flexible.  The 
memory  capacity  may  exceed  twice  the  kernal  size,  if  continuous 
compression  through  overlapped  processing  is  desired. 

The  efficiency  of  a /xSP  in  terms  of  the  ratio  or 
overhead  logic  to  needed  multiplications,  additions,  and 
storage,  must  be  high.  Otherwise,  a special-purpose  pulse 
compression  unit  will  pop  up  in  many  applications  and  thereby 
reduce  the  volume  of  /iSPs  produced.  Furthermore,  if  the  relative 
advantage  of  a pulse  compressor  box  is  too  great,  other  calcula- 
tions, such  as  map  correlation  and  arbitrary  filtering  will  be 
converted  to  exploit  the  pulse  compressor. 

3.4.3  Range  Sample  Compression 

Combining  several  range  samples  at  the  start  of 
processing  drastically  reduces  the  computation  load.  This  is 
possible  when  the  range  resolution  of  the  system  exceeds  the 
resolution  required  of  the  current  mode.  Methods  vary  from 
simple  integration  to  complicated  adaptive  moving  target  indi- 
cator (AMTI)  filters.  Figure  50  shows  a three  point  MTI,  with 
its  need  for  4 real  multiplies  and  4 read  subtractions  per  input 
point. 
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Figure  50  - MTI 


3.4.4  Roughing  Filters 

Roughing  filters  reduce  signal  bandwidths,  and  thus 
reduce  the  required  sampling  rate.  In  the  past  typical  imple- 
mentation has  been  by  recursive  filters  such  as  combinations  of 
the  two  pole,  two  zero  module  shown  in  Figure  51.  Todav  the 
finite  impulse  response  (FIR)  filter  is  replacing  such  HR 
types  of  filters  by  offering  more  performance  and/or  less  com- 
putation. Particularly  with  a /iSP,  the  number  of  computations 
required  for  a FIR  filter  such  as  shown  in  Figure  52  can  be 
manipulated  depending  on  kernel  length,  bandwidth  reduction, 
and  configuration  cleverness. 

3.4.5  Spectral  Analysis 

The  FFT  for  spectral  analysis  has  emerged  as  a sig- 
nal processing  fundamental.  Actually  the  FFT  or  FFT  ^ are  just 
a collection  of  FIR  filters  whose  common  calculation  is  very 
efficient.  Figure  53  shows  a base  two  configuration,  or  "but- 
terfly” for  a FFT  stage.  For  a /iSP,  higher  bases  add  too  much 
complexity  per  stage  to  compensate  for  their  overall  computation 
reduction.  Multiple  arithmetic  units  can  increase  the  FFT 
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thruput,  provided  memory  modularity  constraints  don't  add  too 
much  overhead. 


r 


u.  -A 


r 


Figure  53  - FFT 

The  FFT  calculation  lends  itself  to  many  peritiuta- 
tions,  such  as  multiplier  position  before,  rather  than  after, 
the  add-subtracts,  and  coefficient  rearrangement.  A possible 
problem  is  the  bit-reversed  order  of  the  FFT  output  which  has 
been  solved  by  having  the  control  for  the  /iSP  capable  of  reor- 
dering up  to  8 stages  of  FFT  on  one  step  simultaneous  with  the 
next  processing  operation. 

Another  possible  problem  is  efficient  use  of  the  FFT 
on  real  input  data  since  the  FFT  is  defined  only  for  complex 
data.  Here  Raytheon  had  developed  a method  of  computing  a real 
N point  spectrum  with  close  correspondence  to  the  flow  for 
computing  a complex  N/2  point  spectrum.  Such  techniques  affect 
the  AU  control  design  flexibility. 

3.4.6  Walsh/Hadamard  Transform 

The  Fast  Walsh/Hadamard  Transform  is  similar  to  the 
Fast  Fourier  Transform.  The  FWT  is  more  economical  because  all 
of  its  multiplications  are  by  +1.  There  is  a question  as  to  how 
much  importance  should  be  attached  to  the  FWT  in  the  Micro-Signal 

Processor  Study.  Based  on  the  discussion  below,  the  FWT  is  of 
limited  importance  and  does  not  influence  the  fiSP  features.  A 
FWT  can  always  be  done  as  an  FFT  with  multiplications  set  to  +1 
These  remarks  are  valid  regardless  of  whether  the  FWT  is  used  for 
spectral  analysis,  convolution,  or  communication. 
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Spectral  analysis  has  been  used  on  radar  systems  to 
(a)  improve  the  S/N  ratio  of  narrow  band  targets  immersed  in 
noise  (eg,  pulsed  doppler  radar) , and  (b)  resolve  targets  in  SAR 
applications.  The  major  deterent  to  the  use  of  the  FWT  in  these 
applications  is  the  problem  of  interpreting  the  difference  be- 
I tween  the  Fourier  frequency  response  and  the  Walsh  sequency  re- 

sponse. Particularly  confusing  is  the  fact  that  a change  in 
position  of  a signal  has  almost  no  effect  on  the  frequency  re- 
sponse while  it  significantly  changes  the  sequency  response. 

The  class  of  functions  over  which  the  FWT  can  be  used 
for  fast  convolution  is  more  restricted  than  the  class  of  func- 
tions for  which  the  FFT  can  be  used.  For  radars  the  restricted 
set  of  usable  functions  is  much  too  restricted.  The  application 
of  the  FWT  to  communication  problem  has  appeared  frequently  in 
the  literature  (See  bibliography  in  N.E.  Blackman,  Sinusoids 
versus  Walsh  Functions,  Proc.  IEEE  Vol  62  pp.  346-354,  1974^^). 
Despite  these  attempts  to  use  the  FWT  in  communications,  it  has 
not  been  applied  to  any  extent  in  working  systems  the  way  the 
FFT  has . 

In  communications  systems,  there  has  been  a contin- 
ual quest  for  methods  that  reduce  the  bandwidth  of  transmitted 
information.  This  occurs  because  the  number  of  transmission 
channels  is  limited  as  in  radio,  or  expensive  as  in  telephony. 

By  increasing  the  cost  of  terminal  equipment,  it  is  hoped  to  re- 
duce the  bandwidth  and  obtain  better  utilization  of  the  channel. 

Of  immediate  concern,  is  a comparison  of  the  FWT 
i as  against  the  FFT  in  reducing  signal  bandwidth  while  giving 

j consideration  of  the  relative  cost  of  these  two  approaches.  In- 

j terpreting  Blackman's  results  (FEEC.  IFEE  Volume  62,  p.347, 

i 1974) , under  the  most  favorable  conditions  and  for  the  same 

j accuracy  of  transmission,  the  FWT  requires  150%  of  the  bandwidth 

' required  by  the  FFT.  This  comparison  is  less  favorable  tj  the 

• FWT  when  the  most  favorable  conditions  are  not  present. 
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At  the  same  time  this  occurs,  of  one  should  delete  all  the  multi- 
ples from  a generalized  signal  processing  system  he  would  save 
only  about  20%  of  the  hardware  complexity.  Such  a trade  off  is 
not  attractive  enough  to  bring  about  extensive  application  of  the 
FWT  in  communications  systems. 

An  alternative  to  FFT  for  spectral  analysis  is  the 
Maximum  Entropy  FiJter.  The  latter  is  noted  for  its  ability  to 
discriminate  between  two  closely  spaced  targets.  Its  computation- 
al requirements  for  N Real  input  points  and  M targets  (eg. ^itera- 
tions) is  about  4nm  Real  multiplies,  4NM  Real  adds  and  M divides. 

2M 

Tor  M 2 and  N 2 , the  FFT  approach  requires  fewer  calculations. 


3.4.7  Magnitude  and  Integration 

Magnituding  removes  the  phase  angle  information  from 
complex  data.  Integration  builds  up  the  signal  strength  by 
averaging  over  several  turns.  Figure  54  illustrates  the  arith- 
metics involved  based  on  a magnitude  approximation  using  the 
larger  and  smaller  of  the  real  and  imaginary  components.  For 
greater  accuracy,  particularly  with  coordinate  conversions,  a 
simple  trig  function  technique  can  give  better  than  1 percent 
accuracy.  Such  a concept  has  been  refined  into  a novel  scheme, 
with  patent  being  applied  for,  to  do  magnituding  in  the  f<SP  by 
angle  rotation.  Provisions  exist  for  iteration  to  even  higher 
accuracy. 


A 


NOTE:  WHEN  a = AND  p =,  LARGEST  ERROR  = 

1 1/2  8.6% 

0.955  0.-<M  4.5% 


Figure  54  - Magnitude  and  Integrate 


3.4.8  Adaptive  Threshold  Generation  (CFAR) 

Sliding  window  averages  of  CFARs  (Constant  False 
Alarm  Rate)  comnute  the  local  noise  background  for  setting  tar- 
get detection  thresholds.  The  probability  of  an  incorrect  judg- 
ment, or  false  alarm  rate,  can  thus  be  kept  fairly  constant  over 
a large  search  area.  Figure  55  shows  the  arithmetic  computations 


involved,  with  typical  window  sizes  of  8,  16,  32  and  64  points. 
A target  exceeding  a relative  threshold  is  then  tagged  with  a 
range  and  doppler  identifier. 


Figure  55  - CFAR 

r 

Even  more  complicated  forms  of  CFAR  algorithms  exist, 
but  in  our  experience,  they  don't  drive  the  design  of  the  arith- 
metic configuration  because  their  thruput  is  low.  However, 
for  these  operations  the  /iSP  really  excels  in  system  development 
time,  algorithm  flexibility,  and  small  size  compared  with  a GP 
computer  or  a hardwired  digital  approach. 

3.4.9  Signal  Processor  Implications 

Supplementing  the  above  functional  analysis  with  sam- 
ple system  sizes  yields  the  relative  amount  of  arithmetics  and 
storage  required  for  each  algorithm.  These  are  shown  in  Table  14 
Note  that  despite  the  varying  number  of  real  adds  and  real  multi- 
plies required  per  input  point,  the  ratio  of  the  two  is  mostly 
between  one  and  two  adds  per  multiply.  Because  of  multiplier 
costs  and  thruput  needs,  we  recommend  that  the  ^SP  Arithmetic 
Unit  contain  one  real  multiplier  and  two  real  adders.  Further- 
more, the  Arithmetic  Unit  should  use  a multiclock  macro  instruc- 


TABLE  14 

SAMPLE  COMPUTATION  REQUIREMENTS  DERIVATION 
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tion  as  done  in  Raytheon's  MINI  GPSP.  This  would  allow  a min- 
imum number  of  arithmetics  to  execute  the  most  basic  functions 
such  as  in  Figures  45  through  54,  in  one  pass  of  the  data  through 
the  AU.  The  less  frequently  called-upon  algorithms  can  take  more 
steps,  if  need  be,  to  keep  the  AU  design  simple  without  signifi- 
cantly reducing  overall  performance. 

Adding  the  functions  derived  in  the  above  analysis 
to  the  functions  already  implemented  in  Raytheon's  programmable 

signal  processors  yields  the  list  of  desirable  algorithms  given 
in  Table  15.  Elimination  of  multiple  variations  of  the  same  type 
of  computation  leaves  the  abbreviated  list  presented  earlier  in 
Table  14. 

TABLE  15 

GENERAL  SIGNAL  PROCESSING  FU'ICTIONS 


Vector  negate 
Vector  add 
Vector  subtract 
Vector  multiply 
Vector  divide 
Vector-scalar  add 
Vector-scalar  multiply 
Vector  normalize 
Vector  rescale 
Sum  of  vector  elements 
Dot  product  of  2 vectors 
Max,  element  of  vector 
Min.  element  of  vector 
Vector  max,  magnitude 
Vector  min.  magnitude 
Complex  vector  multiply 
Complex  vector  reciprocal 
Complex  vector  magnitude 
Comples  conjugate 


Complex  FFT 
Real  FFT 
Inverse  FFT 

Bit  reverse  order  an  array 

Convolution 

Correlation 

Weiner-Levinson  algorithm 

Burg  algorithm 

Bandpass  filter 

Power  spectrum 

Complex  spectrum 

FIR 

FIRN 

FIRPZ 

HR 

Cross  spectrum 
Transfer  function 
Coherence 

Sliding  window  sum 
CFAR 
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3, 5 Wordlenqth  Analysis 


3.5.1  Introduction 

Usually  wordlength  studies  are  directed  to  a specific 
application.  For  the  Micro  Signal  Processor  we  are  attempting 
to  establish  bounds  on  wordlengths  for  a machine  of  general  cap- 
ability. In  this  broader  context,  we  must  consider  classes  of 
filters  and  ranges  of  performance  levels.  Our  task  is  some- 
what eased  by  implementation  consideration  because  digital  words 
are  conveniently  available  in  4 bit  sections. 

We  shall  start  with  a view  of  the  importance  of 
having  a variety  of  word  formats  to  do  computations  within  a 
fixed  word  size.  Historically,  as  digital  arithmetic  and  memory 
became  cheaper,  limited  bit  processing  is  no  longer  justifiable 
in  the  performance  cost  tradeoffs.  Furthermore,  multimode  pro- 
cessing retiuirements  add  to  the  need  for  including  multiple  for- 
mat words  within  a programmable  processor.  These  formats  in- 
clude: 

• Single  precision  complex 

• Double  presicion  integer 

• Single  precision  floating  point 

• Double  precision  floating  point 

• Block  floating  point 

The  variable  word  formatting  allows  the  maximum  dynamic  range  to 
be  carried  where  needed.  More  generally ,^unit  is  no  longer  tail- 
ored to  one  task,-  but  can  be  applied  to  many  of  the  functions 
which  are  required  for  the  mission. 

Word  lengths  are  established  on  the  basis  of  filter 
sidelobe  levels,  and  dynamic  range.  Based  on  material  supplied 
below,  a word  length  of  12  bits  is  required  to  support  appli- 
cations requiring  dynamic  range  and  sidelobe  levels  in  the  50  db 
class.  An  extended  word  length  of  up  to  24  bits  should  also  be 


available  to  accommodate  those  situatiohs  where  high  orocessing 
reqyirements  are  encountered. 

3.5.2  GENERALIZATIONS 

3. 5.2.1  Fixed  Point  Single  Precision 

This  is  a format  used  in  smaller  machines  and 
is  usually  the  smallest  word  size  format.  It  is  used  in  pro- 
cessing data  in  conjunction  with  a predetermined  scaling  sequence. 

Its  main  use  is  in  algorithms  where  pair-wise 
summation  take  place  and  interest  in  the  output  lies  in  the 
extraction  of  a target  from  a noise  background.  An  example  of 
such  an  algorithm  is  the  FFT.  The  signal  to  noise  improvement 
of  pair-wise  summations  allows  scaling  and  word  truncation  to 
take  place  in  such  a way  as  to  maintain  the  same  word  size  and 
still  extract  the  targets  of  interest. 

3. 5.2.2  Double  Precision 

This  format  uses  two  of  the  single  precision 
words  to  represent  one  large  word.  In  signal  processing,  since 
most  words  travel  as  complex  pairs,  the  single  precision  format 
usually  carries  two  of  the  vector  components.  The  double  pre- 
cision format  uses  both  these  words  to  carry  one  vector  component 
with  twice  the  precision. 

The  main  use  of  the  format  is  where  truncation 
produces  unacceptable  errors.  One  such  place  is  where  a large 
accumulation  of  samples  takes  place  to  generate  an  average.  Such 
an  average  might  be  used  to  measure  background  noise  for  the  pur- 
poses of  establishing  a detection  threshold.  The  data  cannot  be 
truncated  or  scaled  until  all  points  have  been  summed,  if  they 
are  to  have  equal  weight  and  accuracy  in  their  effect  on  the 
final  answer. 
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One  other  case  where  truncation  is  unacceptable 
is  in  various  stages  of  SAR  map  processing.  The  map  infor- 
mation in  this  case  contains  energy  in  a wide  spectrum  of  fre- 
quencies. It  is  therefore  noise -like  in  the  way  it  passes 
through  an  FFT.  There  are  cases  where  a strong  reflector,  such 
as  a water  tower,  dominates  the  receiver  dynamic  ranqe.  Such  a 
reflector  acts  as  a false  target.  We  are  not  interested  in  this 
target,  but  rather  in  the  fine-grain  detail  of  the  map  — which 
looks  like  noise.  We  therefore  cannot  truncate  this  noise  as 
we  previously  did  when  we  were  only  interested  in  the  dominant 
target.  Word  growth  has  to  be  allowed  until  noise  growth  takes 
place.  In  large  size  FFT's  this  requires  double  precision  in  the 
later  stages  of  the  FFT  process.  This  is  a similar  problem  to 
multiple  target  detection. 

3. 5. 2. 3 Floating  Point 

This  format  carries  a complex  mantissa  and  a 
common  exponent.  Scaling  in  this  format  is  data  dependent,  with 
shifting  down  and  truncating  occuring  as  data  grows.  One  area 
where  this  becomes  very  useful  is  in  weighted  accumulate -and - 

dump  filters  such  a clutter  cancellers  or  FIR  filters  where  we 
are  interested  in  target  detection.  Such  accumulations  would 
have  to  allow  extensive  word  growth  if  we  were  interested  in  the 
accuracy  of  the  answer  such  as  in  the  noise  average.  However, 
when  such  growth  occurs  we  are  dealing  with  large  targets  where 
gain  and  cancellation  accuracy  are  not  important.  Such  targets 
are  easily  detected.  Since  we  are  dealing  with  one  cell,  the  pre- 
sence of  a large  target  will  mask  any  small  targets  presence  in 
the  same  cell.  With  just  small  targets  present  truncation  need 
not  occur  and  cancellation  accuracy  is  therefore  preserved. 

Another  area  where  floating  point  is  important 
is  in  multiplication  and  division  of  video  data  by  video  data. 

One  such  example  occurs  in  monopulse  ratio  calculation  used  for 
target  tracking  and  ground  moving  target  detection.  Here  the 
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floating  point  is  used  in  a normalization  mode  which  moves  the 
significant  information  toward  the  most  significant  bits  of  the 
word.  This  is  done  prior  to  mulitplication  or  division  in  order 
to  eliminate  the  need  for  increasingly  large  word  sizes  to  hold 
answers. 

3.5.3  Dynamic  Range 

3. 5. 3.1  Input  Dynamic  Range 

We  treat  input  dymanic  range  for  completness. 

The  performance  of  a digital  signal  processor  is  more  properly  con- 
trolled by  the  required  output  dynamic  range.  The  input  dynamic 
range  is  set  by  the  A/D  converter  sourcing  the  data.  Depending 
on  the  problem  this  ranges  from  1 to  14  bits. 

3. 5. 3. 2 Outupt  Dynamic  Range 

The  broadest  concept  of  output  dynamic  range  is 
set  by  the  input  dynamic  range  coupled  with  any  signal  to  noise 
gain  incurred  during  processing.  In  consideration  that  signal  to 
noise  improvements  of  30  to  40  db  are  not  uncommon,  we  can  expect 
worst  case  output  dynamic  ranges  of  8 to  21  bits. 

The  long  word  lengths  developed  above  should  not 
be  accepted  as  a definitive  argument  for  setting  processor  word- 
lengths.  There  are  fundamental  limitations  as  well  as  systems 
requirements  limitations  that  allow  the  practical  use  of  much 
shorter  wordlengths. 

3. 5. 3. 2.1  Signal  Detection 

In  many  applications,  the  signal  processor 
is  being  used  to  permit  detecting  a small  signal  in  the  presence 
of  noise.  Acceptable  detection  can  be  achieved  when  the  signal 
to  noise  ratio  at  the  output  of  the  signal  processor  is  10  to 
13  db.  If  we  force  the  quantizing  noise  generated  during  the 
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processing  to  be  small  compared  to  the  background  noise  accompany- 
ing the  signal,  this  level  of  performance  can  be  obtained  using 
3 or  4 bits. 

Of  course  there  will  be  situations  when  the  sig- 
nal is  substantially  larger  than  the  background  noise.  For  this 
situation  we  must  either  provide  more  bits  to  handle  the  larger 
signal,  or  we  must  allow  for  scaling  within  the  processor  to  pre- 
vent overflow  of  larger  signals. 

3. 5. 3. 2. 2 Signal  Separation 

An  alternative  to  the  signal  detection 
situation  occurs  in  the  signal  separation  case.  In  this  instance 
a small  signal  is  to  be  observed  in  the  vicinity  of  some  larger 
signal.  In  this  situation  the  system  performance  is  frequently 
limited  by  the  capabilities  of  the  waveforms.  An  example  of  this 
type  of  limitation  is  furnished  by  pulse  compression  systems.  In 
such  systems,  the  main  response  is  centered  in  a region  of  side- 
lobes.  The  sidelobe  level  is  set  by  a combination  of  the  time 
band  width  product  of  the  waveform,  and  the  weighting  function 
used  for  sidelobe  reduction.  The  weighting  function  selection 
involves  trading  off  resolution  and  detectability  of  obtain  lower 
sidelobes.  The  sidelobe  levels  sought  in  practical  systems  range 
from  35  to  50  db.  Because  of  these  limitations,  in  tne  vicinity 
of  a strong  target,  a 60  db  output  dynamic  range  would  permit 
observation  of  the  strong  targets  sidelobes  and  any  smaller  tar- 
get that  exceeded  the  sidelobes. 

3. 5. 3. 2. 3 Intense  Clutter  Background 

A most  severe  processing  situation  occurs 
when  pulsed  doppler  radars  are  used  to  examine  low  flying  targets 
from  a look  down  position.  Systems  of  this  type  can  require  60 
to  90  db  of  discrimination  against  the  clutter  background.  In 
this  situation,  the  output  dynamic  range  requirements  are  moderate. 


The  input  dynamic  range  requirements  are  more  severe  because  the 
target  signal  must  exceed  the  A/D  quantizing  noise  in  the  narrow 
bandwidth  occupied  by  the  target.  The  most  severe  requirement  is 
imposed  by  filter  stop  band  attenuation  which  excludes  the  strong 
clutter  from  the  detection  process.  As  an  example  in  a system 
with  60  db  clutter  to  target  ratio,  and  a 1000;  1 bandwidth  re- 
duction we  have  30  db  of  processing  gain  against  A/D  quantizing 
noise.  To  obtain  a 15  db  signal  to  quantizing  noise  ratio  a 
7 bit  A/D  is  required.  In  addition  a 75  db  of  stop  band  attenua- 
tion must  be  provided. 

3. 5. 3. 3 Gain  Adjustment 

In  several  of  the  cases  examined  above,  attention 
was  concentrated  on  making  a local  observation  of  a weak  target. 
The  aspect  of  a strong  target  was  not  of  major  concern.  Using 
this  outlook,  the  presence  of  strong  signal  require  accommoda- 
tion by  some  form  of  gain  adjustment,  or  by  increasing  wordlength. 
This  implies  providing  gain  adjustment  for  the  receivers,  and 
the  allocation  of  extra  bits  to  the  A/D  converter  before  the 
processor. 

The  processor  accommodation  will  require  the  use 
of  fixed  point  scaling,  or  some  form  of  full  floating  point,  or 
block  floating  point  operation. 


3.5.4  Processor  Wordlength 

The  major  function  of  a signal  processor  is  filtering. 
This  filtering  involves  multiplying  input  signal  samples  by 
weighting  coefficients  and  adding  these  products  and  other  pro- 
ducts to  obtain  the  filtered  results. 

The  wordlength  required  for  the  weighting  coefficients 
(which  affects  response  shape)  can  be  treated  separately  from  the 
wordlength  used  for  computing  and  storage  (which  affect  dynamic 
range) . 
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3. 5. 4.1  Weighting  Coefficient  Wordlength 

A filter's  response  is  set  by  its  poles  and  zer- 
oes. It  is  possible  to  separate  the  examination  of  zeroes-only 
filters  from  filters  having  only  poles. 

3. 5. 4. 1.1  Zeroes  Only  Filters 

Chan  and  Rabiner  [1]  resolved  the  coeffi- 
cient wordlength  requirements  of  a zeros  only  (FIR-Finite  Im- 
pulse Response)  filter  with  the  relation  that  the  inband  rejection 
is  given  by 


DLk  >_  -20  LOG 


10  . 


(1) 


where  t is  the  word  length  exclusive  of  sign,  N the  number  of 
samples  in  the  impulse  response,  DLi,  the  desired  in  band  rejec- 
tion  with  ideal  implementation  and  DL„*  the  achieved  inband  re- 
jection.  The  symbol  2.  means  that  most  of  the  time  this  level 
of  performance  will  be  achieved.  Since  this  relation  is  based 
on  the  round  off  statistics  occasionally  the  results  will  be 
worse  than  predicted. 

Inband  rejection  has  the  obvious  meaning 
in  the  filter  stop  band,  i.e.  how  many  db  of  rejection  do  you  get 
in  the  stop  band.  In  the  pass  band,  it  is  a measure  of  the  de- 
viation of  the  response  from  the  desired  response.  Based  on  this 
concept,  we  can  construct  Table  16. 


(I 


In  consideration  that  stop  band  attenua- 
tion of  40  to  60  db  is  frequently  required  we  can  observe  that 
the  stop  band  performance  will  generally  set  the  required  number 
of  bits  for  the  coefficients. 
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TABLE  16  _ CONVERSION  OF  PASS  BAND  RIPPLE  TO  IN  BAND  REJECTION 


In 

In 

In 

Band 

Band 

Band 

Ripple 

Deviation 

Rejection 

db 

db 

db 

0.2 

0.1 

-38.8 

0.5 

0.25 

-30.7 

1.  0 

0.5 

-24.5 

3.0 

1.5 

-14.5 

Figure  56  was  constructed  from  eqn.(l)  us- 
ing N equal  to  32  to  show  the  effect  of  different  coefficient 
wordlengths  on  stop  band  rejection.  N=32  was  used  because  exper- 
ience has  indicated  that  filters  having  very  long  impulse  respon- 
ses can  be  implemented  using  multiple  filter  stages  with  sample 
thinning  between  stages  [2].  From  Figure  57  we  can  observe  that 
40  db  performance  can  be  achieved  with  50  db  design  and  10  bit 
weights,  50  db  performance  with  a 60  db  design  and  12  bit  weights, 
and  60  db  performance  with  a 70  db  design  and  14  bit  weights. 

3. 5. 4. 1.2  Filters  Having  Poles 

For  filters  having  poles,  we  refer  to  two 
reasonably  high  performance  filter  designs  as  examples  on  which 
to  base  our  judgements. 

The  first  example  was  dis-cussed  by  Crochiere 
(16).  This  filter  is  an  8 pole,  bandpass,  elliptic  filter  with 
the  following  characteristics. 


Bandwidth/fo 

5% 

Transition 

width/fo 

25% 

Pass  band 

Ripple 

0. 5db 

Stop  band 

Attenuation 

40db 

with  fo  the  sampling  rate 
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SIDELOBE  DEGREDATION 
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Several  different  structures  were  used  to 
implement  this  filter,  and  it  was  concluded  that 

(a)  Maintaining  passband  flatness  required  more  coefficient 
accuracy  than  maintaining  stop  band  attenuation. 

(b)  The  structure  using  a cascade  of  2-pole  sections  required 
9 bit  coefficients. 

An  example  from  Raytheon  experience  was 
an  8 pole  low  pass  elliptic  filter  with  the  following  character- 
istics: 


Band  width/fo  5.5% 

Transition  width/fo  3% 

Pass  band  Ripple  0.5db 

Stop  Band  Attenuation  75db 


This  filter  was  implemented  as  a cascade 
of  2-pole  sections  and  required  10  bit  coefficients  to  achieve 
the  desired  performance  levels. 

Although  these  two  filters  do  not  represent 
a full  range  of  filter  requirements  they  do  indicate  that  10  bit 
word  lengths  can  provide  filters  with  poles  that  produce  results 
that  achieve  high  quality  performance. 

3. 5. 4. 1.3  Coefficient  Wordlength  Recommendation 

For  filtering  systems,  based  on  4.1.1  and 
4.1.2,  and  in  consideration  that  wordlengths  come  in  convenient 
4-bit  sections,  a 12  bit  coefficient  word  length  would  provide 
an  adequate  range  of  capability. 

I 

3. 5. 4. 2 Wordlength  for  Computation 

During  processing,  wordlengths  will  be  reduced 
after  multiplication,  These  round  off  operation  inject  noise 

12  5 


into  the  system  that  propogates  to  the  output.  As  with  the  case 
for  coefficient  wordlengths,  we  separate  the  zeros-only  filter 
from  the  filter  having  poles. 

3 . 5 . 4 . 2 . 1 Round  Off  Noise  in  Zeros  Only  Filters 

There  are  two  situations  of  interest  here. 
The  first  case  is  that  of  direct  form  implementation.  In  the 
direct  form,  each  round  off  produces  noise  which  propogates  to 
the  output  with  unit  gain.  For  an  N stage  linear  phase  filter, 
there  are  N/2  round  off  multiplications.  These  produce  an  output 
noise  variance: 

of  -I-  sf  . (2) 

For  a b-bit  word  including  sign; 


2 2 2(b-l)  (3) 

" ~T2 

The  peak  signal  the  filter  output  can  have  is  unity.  Combining 
these,  the  dynamic  range  of  the  filter  output  becomes 


DR  = 10  LOG 


,2(b-l) 


X 24 


Evaluation  of  (3)  is  shown  in  Table  17 

TABLE  17 

DYNAMIC  RANGE  OF  ZEROS  ONLY  FILTERS 


N 

10 

b 

8 

46 

10 

58 

12 

70 

14 

82 
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For  the  second  case  we  treat  a cascade 
filter  arrangement  where  noise  generated  in  the  earlier  stages 
is  filtered  by  the  action  of  subsequent  stages.  We  choose  for 
this  example  the  FFT,  because  of  its  wide  usage.  We  examine  the 
highest  frequency  term  of  the  FFT  because  it  has  the  most  multi- 
plies. Figure  57  shows  its  flow  chart  in  pruned  form  and  in- 

2 

eludes  injected  noise  powers  of  q /3  at  each  of  the  nodes,  and 

2 

noise  power  gains  of  1/2  q /3  was  used  because  four  multxplies 
are  used  in  the  complex  twiddle. 


Figure  57  - fft  Pertinent  to  Noise  Analysis 


From  Figure  57,  the  output  noise  power  is 


N 


out 


^ o 2\  /,  .2 

i X -2  I + i X -3 

2^3  114  3 


(5) 


Extending  this  result,  and  taking  the  maximum  output  of  the 
FFT  as  unit  we  have 


DR  = 10  log  3X2  ^ 


(6) 


log2  N 


with  N the  number  of  points  in  the  FFT  and  b the  word  length. 
Evaluation  of  (5)  is  shown  in  Table  18. 
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TABLE  18 

DYNAMIC  RANGE  OF  FFT  WITH  1/2  POWER  SCALING: 


i 


I 

f' 


N 

8 

16 

32 

64 

b 

8 

42 

41 

40 

39 

10 

54 

53 

52 

51 

12 

66 

65 

64 

63 

14 

78 

77 

76 

75 

3, 5. 4.2.2  Round  off  in  Filters  with  Poles 

The  filters  use  in  examining  coefficient 
wordlengths  will  also  be  used  in  examining  round  off.  We  take 
the  implementation  as  being  a cascade  of  2-pole  sections. 

For  a second  order  section,  the  output 
noise  variance  is  given  by  [7], 


- . . 1+x^ 

1 

§2  = 

- „-2(b-l)  , X 

1-x 

+ 1 - 2x^  Cos 

2 e 

(7) 

0 

12 

Where  x d 

is  the  polar  coordinates 

of  the  poles.  The 

pole 

radius 

is  limited  to  unit  value  for  a stable  filter- 

Equation  (6)  indicates  that  the  poles  with 
the  largest  value  of  r produce  the  greatest  output  variance.  In 
a cascade  arrangement,  the  poles  with  small  radius  should  be 
placed  at  the  end  of  the  filter  to  reduce  the  effect  of  large 
pole  sections  by  filtering.  In  this  arrangement,  the  output 
noise  will  be  dominated  by  the  contribution  of  the  last  section. 

With  this  choice,  the  results  of  Table 
19  were  computed  for  the  Crochere  and  Raytheon  filters. 
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TABLE  ly 

DYNAMIC  RANGE  OF  TWO  FILTERS  BASED  ON  LAST  STAGE 


Crochere  Filter  Raytheon  Filter 


and  3. 5. 4. 2. 2,  and  in  consideration  that  wordlengths  come  in  con- 
venient 4 bit  sections,  a 12  bit  processing  wordlength  is  recom- 
mended. 

3. 5. 4. 3 Severe  Requirements 

It  is  to  be  anticipated  that  some  applications 
will  require  greater  performance  levels  than  those  assumed  here. 
In  anticipation  of  these  requirements,  multiple  precision  opera- 
tions should  be  available  in  the  system. 
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SECTION  IV 

STATE-OF-THE-ART  SURVEY 

4. 1 Introduction 

This  task  involved  a survey  of  logic  building  blocks  and 
their  trends  to  postulate  compatible  micro  signal  processor  de- 
sign features.  The  objective  is  to  maximize  the  use  of  commer- 
cially available  and  compatible  LSI  chips  for  now  and  some  few 


years  to  come.  Many  surveys  have  appeared  recently  trying  to 
project  IC  developments,  such  as  the  article;  Trends  in  Computer 
Hardware  Technology,  by  D.  Hodges  in  the  February  1976  issue  of 
Computer  Design.  The  key  elements  we  have  superimposed  on  these 
technology  predictions  are  our  past  experience  and  our  present 
healthy  skepticism. 

Standard  building  blocks  began  emerging  in  the  MSI  era  of 
ICs.  Exactly  the  same  function  could  be  obtained  independently 
of  the  technology  family  desired,  whether  it  be  CMOS,  TTL,  LS/TTL 
to  ECL  lOK.  Examples  were  selectors,  hex  D flops,  counters, 
arithmetic  logic  units  (ALU),  scratchpads  and  priority  encoders. 
Standardization  allowed  the  manufacturer  lower  development  costs 
and  risk,  higher  volume  while  giving  the  user  lower  prices, 
familiar  functions,  shorter  design  cycle  and  intelligible  design. 

Today  a number  of  standard  functions  are  emerging  in  the  LSI 
technology,  although  their  design  details  are  not  100  percent 
identical.  Examples  are  the  CPU  slice  with  its  register  file 
and  ALU,  sequencers,  input-output  bus/communicators,  first-in 
first-out  queue's  (FIFOs),  last-in  first-out  queue's  (LIFOs  or 
Stacks),  and  high  density  RAMs,  ROMs,  PROMs,  PLAs  and  FPLAs. 

These  offerings  will  be  judged  on  their  survival  likelihood  as 
well  as  their  usefulness. 


The  focus  of  this  survey  is  on  analysis  of  features  and 
trends,  rather  than  on  mountains  of  raw  comparisons.  The  latter 
is  typified  by  the  selection  of  micro  processor  references  given 
in  Table  20. 

TABLE  20 

SELECTED  MICRO  PROCESSOR  REFERENCES 


(18)  Eugene  Hnatek,  "Chipping  Away  At  Core,"  Digital 

Design,  July  1976,  p.  31-42. 

(19)  Jean  Nicoud,  "Peripheral  Interface  Standards  For 

Microprocessors,  Proc,  IEEE,  June  1976,  p. 
896-904. 

(20)  A.  Williams  & H.  Jelinek,  "Introduction  to  LSI  Micro- 

processor Developments,"  Computer,  June  1976, 
p.  34-46. 

(21)  EDN  MICROCOMPUTER  SYSTEMS  DIRECTORY,  Cahners  Publish- 

ing Co. , 1975 , 

(22)  MICROPROCESSOR  SCORECARD,  Microcomputer  Techniques, 

Reston  Virginia,  Mini  Micro  Systems,  July  1976 

(23)  MICROCOMPUTER  D.A.T.A.  BOOK  EDITION  1,  1976,  D.A.T.A. 

BOOK  INFORMATION  SERVICE,  Orange,  N.J. 


The  design  philosophy  here  is  to  employ  the  best  mixture  of 
available  semiconductor  technology  (Figure  58) . The  MOS  devices 
form  complete  CPU's  and  will  be  postulated  for  slower  speed  peri- 
pheral control  and  communication  tasks.  The  commercial  bipolar 
LSI  building  blocks  are  to  be  employed  for  the  high  speed  heart 
of  the  micro  signal  processor.  Functional  areas  where  significant 
performance,  size  or  availability  improvements  exist 
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Fiqure  58  - Available  Semiconductor  Technology 


over  commercial  LSI  will  be  considered  for  implementation  by  Ray- 
theon's programmable  gate  array  capability.  This  partitioning  of 
application  areas  is  expected  to  remain  constant  for  a number  of 
years,  even  considering  that  MOS  speeds  and  bipolar  densities 
will  continue  their  upward  spirals. 

4 . 2 General  Trends 

Some  key  features  of  MOS  devices  are  presented  in  figure  59. 
The  fixed  instruction  formats  preclude  most  architectures  not 
given  by  the  manufacturer.  MOS  microprocessor  thruput  is  low 
(microsecond  add  times) , although  performance  on  instruction  mixes 
may  be  drastically  improved  when  special  arithmetic  hardware  is 
added,  such  as  a bus-oriented  multiplier.  Nevertheless,  MOS  micro- 
processors exist  for  a host  of  peripheral  controller  tasks,  as 
shown  in  figure  60.  For  most  of  those  tasks,  speed  improvements 
are  irrelevant  to  the  mechanical  or  human  factor  limitations.  A 
notable  exception  is  interval  timing. 

The  bipolar  building  blocks  offer  a number  of  advantages 
as  listed  in  figure  61.  Most  significantly,  we  can  configure 
them  into  an  advanced  architecture  for  high  thruput  signal 
processing,  while  still  meshing  with  MOS  LSI  for  peripheral  inter- 
facing. A preview  of  the  types  of  building  blocks  available  to- 
day or  within  the  coming  year  is  shown  in  figure  62.  Note  that 
some  fast  bipolar  devices  such  as  8K  PROMs  and  64  word  FIFO's 
have  been  available  for  two  or  three  years  in  slower  MOS  forms. 

In  choosing  the  bipolar  LSI  building  blocks  for  the  Micro 
Signal  Processor  we  shall  not  hesitate  to  mix  the  best  designs 
from  one  chip  set  with  complementary  chips  from  other  chip  sets. 

For  immediate  production-oriented  efforts  such  a policy  is  hazar- 
dous, since  one  must  depend  on  the  survival  of  several  competing 
design  sets.  Here,  however,  we  are  selecting  particular  LSI  IC's 
only  as  indicative  of  a trend  toward  certain  function  blocks  and 
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Figure  59  - MOS  CPU/Microprocessor' 


FLOPPY  DISC  CONTROLLER 


Figure 


Figure  61  - Bipolar  LSI  Building  Blocks 


Figure 


including  certain  features.  We  expect  that  the  surviving  bipolar 
LSI  family  (or  families)  will  eventually  incorporate  the  most  mar- 
ketable aspects  of  their  fallen  competitor's  designs.  An  over- 
view of  the  present  LSI  Microprocessor  Families  is  given  in  Table 
21.  The  twenty  entries  listed  in  the  table  exclude  inappropriate 
designs  such  as  the  slower-  PMOS  types,  the  older  4 bit  micropro- 
cessors, and  a few  false  starts  such  as  Intel's  8008  and  3000 
series.  A variety  of  technologies  still  remain,  depending  on  trade- 
off of  speed,  low  power,  chip  density  and  manufacturing  costs. 

Clear  trends  include; 

•Domination  and  interrelation  of  8 and  16  bit  machines;  where 
most  8 bit  machines  have  16  bit  address  fields,  and 
some  16  bit  machines  actually  process  data  in  8 bit 
pieces . 

•Bipolar  slices  come  in  4 bit  widths 

• MOS  device  speeds  are  close  to  each  other,  but  an  order  of 

magnitude  slower  than  bipolar  speeds. 

• Inclusion  of  significant  amount  of  reqisters  (RAM)  withina 

the  CPU,  with  16  words  common  and  some  64  words. 

•Operation  with  only  one  5 volt  power  supply 

•Recognition  of  the  need  for  multiple  sources,  with  the  8080 
and  2900  leading  with  5 and  4 sources  respectively. 

•A  large  number  of  microprocessor's  are  really  versions  of 
other  computers,  including  Intersil's  6100  (PDP-8) , 
TMS9900  (TI990) , Data  General  micro  NOVA,  and  West- 
ern Digital's  1600  (PDP-11). 
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• Improvements  on  existing  microprocessors  are  occur ing  rapidly 
including  the  Z80  (on  the  8080) and  the  6502  (on  the 
6800) 

4.3  LSI  IC  Features 

A detailed  examination  of  specific  LSI  chip  types  was  made 
and  compared  in  Table  22,  Table  23  and  Table  24.  Table  22  lists 
bipolar  sequencer  types,  followed  by  CPU  slice  types.  Table  23 
covers  MOS  CPU  types.  Table  24  covers  MOS  peripheral  controller 
types.  Significant  effort  was  put  into  understanding  the  features 
available  across  a host  of  MOS  devices  in  order  to  postulate  fea- 
tures likely  to  be  seen  in  a few  years  in  higher  speed  building 
blocks.  The  presence  within  an  LSI  IC  of  significant  logic  seg- 
ments for  performing  an  identified  computer  task  forms  the  fea- 
tures listed  on  the  left  side  of  the  tables.  These  features  are 
clarified  in  the  remaining  paragraphs  of  this  section. 

Bit  Width  represents  the  number  of  data  bits  which  an  IC 
appears  to  handle  in  parallel  to  the  outside  world.  Bipolar  se- 
quencers are  either  4 bit  slices  or  a fixed  number  of  bits  (up 
to  10  today) . RALU  slices  are  almost  overwhelminalv  4 bits  wide. 
Serious  interest  in  8 or  16  bits  is  exhibited  in  the  MOS  world. 


Program  Address  Counter  is  present  in  all  sequencers  and  MOS 
CPU's  but  rarely  in  peripheral  controllers  or  RALU  slices.  Most 
exceptions  are  caused  by  the  F8  family,  which  distributes  the 
address  function  to  some  bus  communication. 

Program  Stack  is  really  a Last-In,  First-Out  (LIFO)  queue. 

Its  primary  function  is  to  store  subroutine  return  addresses. 

Stack  sizes  vary  from  4 to  16  words,  with  clear  dominance  of  4 
words  in  bipolar  sequencers.  Most  MOS  CPU's  achieve  this  function 
in  other  ways  such  as  software  pointers  to  main  memory  stack. 

We  thus  expect  to  see  a continuation  of  stacks  in  bipolar  sequen- 
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cers  as  long  as  a sequencer  function  is  done  on  a distinct  chip. 


Multi-Way  Program  Counter  Increment  indicates  that  address 
sequencing  can  occur  in  more.- than  just  "current  + 1"  or  "jump  to 
address  input."  This  feature  is  important  for  efficient  multi- 
way branching,  and  can  eliminate  much  of  the  need  for  look  ahead 
on  nested  loops  in  signal  processors.  Note  that  the  Intel  3001 
is  the  only  bipolar  sequencer  which  doesn't  provide  address  incre- 
menting, but  instead  uses  jumping  by  logical  bit  masking.  Hence, 
we  conclude  that  the  jumping  sequencer,  whatever  its  technical 
merits,  is  not  going  to  be  the  mainstream  approach.  Multi-way 
program  counter  incrementing  is  less  important  for  MOS  micropro- 
cessors, where  the  instruction  set  is  already  fixed. 

Instruction  Decoding  varies  in  complexity  from  a few  gates 
to  a full-fledged  PROM  or  PLA  decoding  logic.  The  trade-off  is 
between  execution  time  and  generality  versus  minimum  number  of 
control  pins  and  control  PROM  bits.  The  control  decoder  in  most 
MOS  microprocessors  is  primarily  devoted  to  the  ALU  control, 
thereby  specializing  the  arithmetic  operations  to  the  associated 
fixed  instruction  set.  For  this  project  we  would  prefer  a more 
general  device  having  extra  speed,  and  pay  the  price  in  extra 
bits  and  pins. 

Program  ROM  is  now  found  only  in  selected  MOS  microporcessor 
chips  which  are  oriented  toward  making  systems  with  an  absolute 
minimum  number  of  chips.  Sizes  are  1-2K  words  by  8 bits.  Be- 
cause of  volume  production  considerations,  we  expect  micropro- 
cessor's with  PROM  or  EROM  included  to  be  developed,  but  not  to 
replace  separate  PROM/ROM  IC's.  Having  such  a program  memory 
converts  a general  microprocessor  into  a special  peripheral 
controller  element. 
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Status/Flags/Interrupt  Flip-Flops  and  Logic  is  the  minimum  amount 
of  miscellaneous  interchip  communication  needed  to  make  a working 
system.  All  MOS  IC's  incorporate  some  of  this,  but  only  a few 
bipolar  sequencers  have  these  logic  odds  and  ends.  Use  of  unper- 
sonalized arrays  allows  signal  processor  designs  to  be  indepen- 
dent of  any  one  manufacturer's  approach  to  the  design  and  part- 
itioning of  this  function  need. 

Arithmetic  Logic  Unit  and  Shifter  is  the  key  element  to  dif- 
ferentiate a sequencer  chip  from  a CPU  slice  chip,  or  a micro-  , 

processor  from  a peripheral  control  chip.  Note  that  the  shifters  | 

are  always  only  one  bit  up  or  down,  and  never  the  full  shift-bar- 
rel desired  for  either  true  floating  point  or  at  least  block  scal- 
ing. 

Register  Files  are  clearly  going  to  grow  from  the  16  words  ' 

found  in  some  RALU's  to  the  64  and  128  words  already  seen  in 
some  MOS  units.  Signal  processors  can  benefit  from  this  not  in 
the  data  area  where  megabits  of  RAM  are  likely,  but  in  the  ad- 
dressing and  subroutine  parameter  areas.  Benefits  to  those  areas 
include  eliminating  several  external  IC's  for  tasks  needing  only 
a small  eimount  of  working  storage,  and  faster  interrupt  handling 
mechanism  by  activating  separate  register  segments. 

A Shift  Register  apart  from  the  RALU  allows  faster  instruc- 
tion execution,  such  as  multiplication  and  shifting.  This  feature 
is  not  dominant,  but  useful  enough  and  prevalent  enough  to  post- 
ulate as  included  in  our  ideal  building  block. 

An  Address  Output  Port  separate  from  the  I/O  Bus  ports 
is  a desirable,  but  expensive  feature.  E’our  and  sixteen  bit  MOS 
CPU's  sacrifice  it  to  pin  limitations,  while  i^St  8 bit  micro- 
processors have  it.  Bipolar  sequencers  all  have  it,  but  only 
some  RALU's  have  it.  To  be  of  significant  benefit,  this  address 
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output  port  must  be  tri-state,  cuid  operable  simultaneously  with 
altering  the  data  port  setting.  I/O  bus  ports  are  a feature 


which  comes  in  all  combinations  of  sizes  in  MOS  peripheral  IC's, 
and  maybe  should  be  done  in  that  manner  so  long  as  high  speed  is 
not  needed. 

Clock,  Reset  and  Hold  Clock  are  features  that  save  external 
miscellaneous  logic.  We  expect  that  the  current  multi-phase 
clock  input  to  MOS  chips  will  pass  away,  but  that  clocks  will 
not  be  included  within  the  high  speed  bipolar  chips  for  quite  a 
while.  An  automatic  reset  when  power  is  turned  on,  such  as 
found  in  Signetics  8X02,  is  very  nice  for  sequencers  to  start  in 
a predetermined  location,  but  irrelevant  for  a signal  processor's 
data  paths.  Holding  the  clock  line  allows  use  with  varying  mem- 
ory access  timing  or  instruction  execution  cycles,  but  today  is 
available  only  for  the  8080,  hardly  a trend. 

The  remaining  features,  such  as  interval  timing,  fancy  shift 
register  I/O,  are  found  primarily  on  special  IC  types. 


4.4  Arithmetics 

The  available  high  speed  CPU  slices,  such  as  the  2901, 
appear  to  be  very  useful  for  the  micro  signal  processor  control 
elements,  but  not  suitable  for  the  micro  signal  processor's 
arithmetic  unit.  These  RALU  chips  need  more  input  ports,  a 
pipelining  orientation,  and  shifting  immediately  followinq  arith- 
metics before  output  at  the  very  least.  The  large  quantity  of 
storage  words  within  the  chip  is  largely  unused  for  operations 
involving  only  two  or  three  complex  vectors. 

Progress  is  being  made  in  the  desired  direction. 

Variations  on  the  4-bit  slice  RALU  are  now  appearing,  including 
MMI's  pipeline  type,  TI ' s '481  with  more  ports,  and  Motorolla's 
10800.  Such  variations  on  an  accepted  design  may  be  more  accep- 
table to  the  marketplace  than  drastically  different  CPU  slice 
concepts.  Furthermore,  where  a few  mask  routing  changes  can 

147 

ki 


I i 


convert  an  existing  design  to  a more  desirable  form,  the  cost 
of  improvement  is  small  compared  with  starting  from  scratch. 

The  most  useful  such  variation  is  AMD's  multi-port  RAM,  which  is 
just  a portion  of  their  popular  2901. 

The  arithmetic  format  preferred  within  the  micro  signal  pro- 
cessor is  two's  complement.  Sign  magnitude  allows  greater 
ease  in  mechanizing  normalization,  scaling,  magnituding,  and 
multiplying  while  two's  complement  allows  greater  ease  in  addi- 
tion/ subtraction,  double-precisian  expansion  and  commercial  LSI 
compatibility.  Having  built  programmable  signal  processors  with 
both  formats  in  the  past,  we  recommend  two's  complement  for  LSI 
compatibility. 

A key  arithmetic  task  is  multiplication,  which  often  takes 
as  much  as  1/6  of  the  total  signal  processor  ICs.  By  comparative 
surveys  of  alternative  multiplication  schemes,  such  as  shown  in 
Table  25,  the  following  observations  can  be  made: 

• A micro  processor  is  inherently  inefficient  as  a mult- 
iplier 

• Serial-parallel  multipliers  are  extremely  suitable  to 
LSI  because  of  the  small  number  of  outputs,  and  timing 
domination  by  internal  loop  sizes 

• Eventual  availability  of  a bus-oriented  multiplier  j 

will  add  considerably  to  the  capability  of  even  slow  ^ 

micro  processors,  but  will  not  create  a micro  signal 

processor 

• A true  combinatorial-logic  multiplier  would  be  nice, 
but  the  most  popular  sizes  would  be  unsuitable  for 

ij 

signal  processing.  Our  experience  has  been  that  mult-  *. 

iplier  and  multiplicand  do  not  need  to  be  the  same 

size,  but  a 12  bit  by  12  bit  still  would  be  suitable 

for  all  fast  applications.  Slower  processing  can 

then  give  double-precision  capability.  Yet  the 
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commercial  world  tends  to  think  either  of  8 by  8 or  16  by  16. 

TABLE  25 

SURVEY  OP  DIGITAL  MULTIPLIERS 


Approach 

Size 

IC  Qty. 

No.  Clocks 

Timm  Par 
Multiply 

asac 

Rel. 

Efficiency 

Comments 

Combinatorial 
Logic  s Ideal 

8x8 

1 - 34  pins 
~ 3 

1 

50 

1 

Not  yet 
Commercial 

Bus. Oriented 
of  MM  I 

16  X 16 

1 - 24  pins 

' 2 

12 

100 

1/4 

9-  12  Mo.  Away 

Ser.  Parallel 
AMO  25LS14 

8x8 

5-  16  pins 
' 5 

8 

50 

1/13 

Better  Design 
Possible 

Combinatorial 
AMD  25LS05 

8x8 

8-  24  pins 

2-  16  pins 
^ 20 

1 

150 

1/20 

Simplest,  Multi 
Source 

Micro  Proces. 
AMD  2901 

8x8 

2-40  pins 

1-  16  pins 

1 1 

12 

100 

1/60 

Useful  if  Few 
Mult' s . 

Note: 


Rel.  E££ic.  = 


IC  Qty 


8x8 

X X 

Size 


50 


No.  Clucks  Tim  Per  Multiply 


The  set  of  TRW  multipliers  recently  announced  are  an  inter- 
esting item  to  watch.  The  key  question  is  whether  the  commercial 
semiconductor  vendors  will  second  source  that  device  approach. 
Problem  areas  include  a different  process  than  the  industry 
mainstream,  several  times  more  power  per  chip  than  the  usual  max- 
imums,  and  speeds  on  the  slow  side  of  bipolar  clock  cycles.  We 
do  expect  that  these  devices  will  generate  a user  demand,  and 
eventually  focus  industry  supply  into  the  area  of  monolithic 
multipliers,  which  otherwise  has  suffered  from  slow  progress 
over  the  past  five  years. 

Investigation  was  also  made  into  the  usefulness  of  the  1-bit 
serial  approach,  based  on  serial-parallel  multiplier  IC  from  AMD. 
Figure  63  shows  the  latter  device  is  currently  the  most  efficient 
multiplier  type  commercially  available.  The  best  system  design 
that  can  be  developed  to  exploit  it  uses  a system  timing  which 
alternately  reads  n bits  serially  with  writing  back  n bits  ser- 
ially. This  timing  is  a natural  result  of  the  double-length 
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result  produced  with  that  IC  design,  operating  between  two  memor- 
ies. Only  a minimal  amount  of  fixed-function  signal  processing 
can  be  achieved  with  these  serial  devices.  The  net  thruput  per 
IC  still  is  not  as  great  as  a parallel  approach,  because  of  the 
2n  clock  cycles  per  processed  word  despite  the  potential  for  very 
high  speed  clocks.  These  investigations  do,  however,  provide 
further  bounds  on  the  size-thruput  competition  for  the  micro  sig- 
nal processor  design. 

A fast  one-step  shifter  or  shift-barrel  is  a very  desir- 
able chip  for  a pipelines  signal  processor.  Such  a device  is 
available  today  only  in  small  sizes,  such  as  4 bi;:s  of  0-7  or 
0-3  shifts.  Hence,  we  investigated  the  combination  of  a de- 
coder with  a multiplier  chip  to  produce  this  function  with  a 
minimum  number  of  available  chips.  Further  examination  of  the 
power  and  speed  penalties  have  caused  us  to  reconsider  the  merits 
of  this  concept.  Instead  we  propose  to  do  this  function  as  part 
of  a gate  array,  or  to  await  the  introduction  of  a commercial  IC, 
as  is  now  planned  for  the  ECL  10800  family. 

4 . 5 Memory 

Our  micro  signal  processor  design  emphasizes  replacing 
registers  and  selectors  wherever  possible  with  scratchpads  for 
cost  and  component  count  savings.  Table  26  shows  the  relative 
cost  per  bit  savings  obtained  with  denser,  but  slower  storage 
elements . 


TABLE  26 

MEMORY  TRADEOFFS 


Storage  Type 

Packaged  Cost  Per  Bit 

System  Speed 
(nsec) 

Register- LS  TTL 

50 

25 

16x4  RAM-LS  TTL 

8 

40 

256  X 4 RAM-BIPOLAR 

8 

70 

1024  X 1 RAM-BIPOLAR 

4 

70 

4096  X 1 RAM-MOS 

1 

400  (cycle  lime) 
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The  clear  trend  for  memories  today  is  towards  higher  bit  den- 
sities, with  downgrading  of  memory  hierarchy  concepts.  Bipolar 
is  today  IK  by  1 or  256  by  4 with  a just -announced  IC  of  4k  by 
1.  MOS  gives  IK  by  4 or  4K  by  1 , with  some  16K  memories  being 
sampled.  Sizes  most  likely  to  emerge  with  the  16K  memories 
are  16k  by  1 and  4k  by  4.  As  memories  get  denser,  the  need 
for  a hierarchy  of  memory  speed  goes  away,  saving  the  over- 
head required  to  make  this  hierarchy  invisible  to  the  pro- 
grammer . 

Signal  processor  memory  needs  differ  from  those  of  GP  com- 
puters in  that  read  - write  cycle  time  becomes  the  critical 
parameter,  not  the  faster  access  time  seen  on  some  memories. 

Other  memory  considerations  include  dynamic  operation,  er- 
ror correction  and  line-oriented  memories.  Controlling  refresh 
operations  on  dynamic  memory  can  consume  significant  amounts  of 
a small  computer's  logic,  although  a signal  processor  seldom  al- 
lows data  to  sit  still  for  long.  Error  correction  is  being  in- 
cluded in  many  4k  and  16k  memory  systems  because  of  pattern  sen- 
sitivities. Yet  signal  processor  gain  and  thresholding  algor- 
ithms can  tolerate  most  errors  in  data  bits.  Line  or  serial  type 
memories,  such  as  CCDs  will  always  have  the  edge  on  density,  but 
they  necessarily  restrict  the  problem  solution  to  successive  pro- 
cessing algorithms  or  require  separate  working  (RAM)  storage. 

Our  initial  preference  is  for  simplicity  even  at  cost  of  lower 
densities . 

The  status  of  programmable  elements  is  presented  in  Table 
27,  coverinq  PROM,  ROM,  PLA,  FPLA  and  EROM.  The  trend  has  been 
toward  larger  size  DIPS  to  hold  more  bits.  Erasable  ROM's  have 
clearly  taken  much  larger  packages  than  the  corresponding  PROM, 
aside  from  the  factor  of  4 or  more  in  speed.  Fast  PROM  is  in 
the  region  of  8K  today  for  single  source,  4K  for  multiple 
sources.  ROM's  are  advocated  only  when  the  volume  of  production 
and  bit  densities  justify  rejection  of  the  PROM  option.  We 
foresee  a new  series  of  PROMs  and  FPLA's  being  developed  which  will 
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incorporate  register  or  latch  buffering  on  either  the  input  or 
the  output  paths . 


ii 

} 
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Applications  for  PROM  and  FPLA  in  the  micro  signal  proces- 
sor design  are  in  macro-to-micro  decoding,  macro  program  store, 
tables,  and  replacement  of  low  density  logic.  Because  of  logis- 
tic and  speed  considerations,  its  use  for  logic  substitution  will 
be  minimized. 

4. 6 Control  and  Interface 

Control  decoding  will  emphasize  the  use  of  memory  devices. 

For  example,  the  macro  to  micro  decoding  will  be  PROM,  to  minimize 
size.  Routing  will  be  done  as  far  as  possible  by  memory  address 
selection  rather  than  identifiable  low-density  selectors.  The 
logistic  headaches  of  PROM  and  PLA  will  receive  considera- 
tion in  this  design  approach. 

Mixing  devices  from  different  micro  processor  families,  such 
as  CPU  slices  of  one  family  with  sequencers  of  another  family 
and  interface  circuits  of  a third,  allows  exploiting  the  strengths 
of  each  different  manufacturer's  designs,  and  gives  greater  per- 
formance for  less  pieces.  Yet  a potential  problem  exists  of 
having  to  support  a larger  variety  of  captive  lines  after  the  in- 
evitable design  shakeout  occurs.  Technological  superiority  will 
not  guarantee  IC  survival  against  a strong  competitor's  early 
market  penetration,  volume  yields  and  use  of  established  manu- 
facturing processes  which  are  industry  wide. 

Bus  transceivers  and  bidirectional  I/O  port  chips  dominate 
the  interface  IC  developments.  Raytheon's  approach  of  a 
multiclock  macro  controlled  AU  allows  bus  devices  to  efficiently 
share  the  data  paths  between  micro  signal  processor  elements. 

The  FIFO  is  a key  part  of  the  proposed  control  scheme.  It 

allows  separating  the  instruction  sequencing  operations  from  the 

remainder  of  the  processes.  MOS  FIFO's  have  been  available  for 
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a couple  of  years  now.  Two  bipolar  FIFO's  have  recently  been 
announced,  including  a 16  word  by  5 bit  IC  and  a 64  word  by  4 
bit  IC.  The  later  is  the  prefered  unit  as  it  is  larger  and  also 
compatible  with  the  older  MOS  one  and  fits  the  industry  prefer- 
ence for  4-bit  slices. 

The  more  traditional  computer  interface  devices,  such  as 
vector  interrupt  handlers,  and  programmable  interval  timers,  are 
useful  only  when  considering  the  placement  of  the  micro  signal 
processor  into  a problem  solving  system.  Use  of  such  specialized, 

GP  computer  oriented  IC's  is  confined  to  the  vicinity  of  the  16 
bit  GP  I/O  bus  of  the  micro  signal  processor,  which  handles  pri- 
mary mode  commands  and  BITE  information.  Air  Force  efforts 
towards  standard  interfaces  may  have  the  primary  influence  in 
tnis  area. 

Data  addressing  control  will  emphasize  RALU  slices  more 
than  sequencers.  Signal  processing  addressing  is  normally  in- 
cremental, but  not  always  increments  of  +1,  +2,  or  -1. 

An  item  which  could  simplify  interfacing  is  an  "output  sig- 
nal multiplexer".  This  combines  a programmable  trigger  genera- 
tor with  a multiple  output  binary  rate  multiplier.  Such  an  item 
could  easily  be  made  and  have  wide  application,  but  has  not  yet 
received  tha  attention  of  the  semiconductor  industry.  Appli- 
cations include  radar  action  timing,  replacement  of  multiple  D/A 
IC's,  phased-array  beam  steering  command  generator,  netted  com- 
puter system  control,  and  others.  Design  and  fabrication  of  th 
this  item  appears  feasible  using  Raytheon's  '*00  gate  array  capa- 
bility. Figure  64  shows  the  interface  definitions  for  this  de- 
vice. 

4 . 7 Selected  Building  Blocks 

A summary  of  the  micro  signal  processor  building  blocks 
expected  from  commercial  LSI  is  presented  in  Table  28.  Two  examules 
are  often  given  for  each  category--an  immediately  available  type 

and  a type  whose  introduction  is  planned  within  6 months  to  a 
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Figure  64  -Output  Signal  Multiplexer  Concept 


TABLi^  2 o 

MICRO  SIGNAL  PROCESSOR  BUILDING  BLOCKS 


NOW 

LATER? 

SEQUENCER 

• 4 BIT  SLIC^20  PINS 

AMD  2911 

Tl  54S482 

• 10  BIT  UNIT/40  PINS 

SIG  8X02 

FCD  9408 

RALU 

• 4 BIT  |jP  slice/16  WORDS 

AMD  2901 

• 4 BIT  pipeline  SLICE 

MMI  6702 

BUFFER  RAM 

• 16  X 4 DUAL  OUTPUT 

AMD  29705 

• 64  X 4 FIFO 

MMI  674  1 

MULTIPLIER;  DIVIDER 

• 8 X 8 COMBINATORIAL 

MMI  6755 

• BUS  16  BIT  MULT/DIV 

MMI  6750 

VECTORED  PRIORITY  INTERRUPT 

• 8 INPUTS  EXPANDABLE 

INTEL  8259 

AMD  2914 

RAM 

• 256  X 4/TS  OE/DATA  IN  / OUT 

FCD 

RCA  CMOS/SOS 

• 1 K X 1 FAST 

FCD 

• IK  X 4 SLOW  STATIC 

AMD,  INTEL 

• 4K  X I FAST  STATIC 

FCD,  Tl 

PROM 

• IK  X 4 IN  18  PIN  DIP 

MULTI-SOURCE 

• 2K  X 4 IN  18  PIN  DIP 

SIG 

MOS  pP's 

8080,  6800,  F8 

EA9002,  Z80 

DYNAMIC  RAM  - 16K 

FPLA 

• 16  X 8 X 48/50  nsec 

MULTI 

Ml  SC 

• 3 TO  8 DECODER 

AMD  25LS  2538 
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Two  types  of  sequencer  are  included,  a four-bit  slice  and  a 
ten-bit  slice.  Further  machine  design  will  determine  if  both 
types  are  necessary. 

Two  types  of  Register-Arithmetic-Logic-Units  are  included, 
the  conventional  micro  processor  slice  and  the  pipelined  equivalent 
A more  optimized  pipelined  bit  slice  units  could  be  made  with  a 
mixture  of  300  gate  arrays  and  buffer  RAM's  such  as  the  multi- 
port  16  word  by  4 bit  unit, 

FIFO's  are  desired  to  mesh  variable  execution  time  sequenc- 
ing instructions  with  the  fixed  execution  timing  of  the  arith- 
metic pipeline.  They  are  also  useful  for  buffering  peripheral 
I/O  data  into  a larger  block  of  words  which  is  more  manageable 
in  a software  scheduling  sense. 

Multipliers  are  an  important  part  of  any  signal  processor. 
Thus,  we  shall  assume  availability  of  at  least  a fast  8 by  8 
multiplier,  with  preference  towards  larger  sizes,  such  a 8 by  12 
or  8 by  16,  operating  within  a pipelined  100  ns  clock  cycle. 

For  handling  infrequent  division  operations,  the  one  chip  multi- 
cycle approach  postulated  for  the  MMI7650  is  more  desirable  than 
sending  data  back  to  the  driving  GP  computer  or  performing  mul- 
tiple passes  through  the  signal  processor's  arithmetic  unit  with 
a "divide  by  way  of  multiply"  algorithm. 

Vectored  priority  interrupts  is  a function  which  we  shall 
shape  around  available  designs,  since  interrupts  in  a fxSP  need 
not  oe  nanaled  extremely  efficiently  or  fast. 

Random  access  memory  availability  includes  not  only  the 
speed-density  trade-offs,  but  also  the  width  of  the  data  path  of 
one  chip,  and  the  coupling  or  independence  of  the  data  input 
and  output  paths.  The  4K  words  by  1 bit  fast  static  memory  js 
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expected  to  dominate  fast  data  needs  eventually,  although  the 
IK  by  4 bits  is  useful  in  the  control  memory  area.  Bulk  data 
memory  requirements  should  use  random  access  memory  of  16K 
densities  or  higher,  even  if  that  requires  dymanic  storage  and 
refresh  mechanisms. 

MOS  micro  processors  can  be  categorized  into:  a)  current 
leaders,  such  as  the  8080  (for  the  most  people  adding  parts  that 
meet  with  it),  the  6800  (for  technical  niceties)  and  the  F8 
(for  absolute  mmimum  number  of  IC's)  and  b)  the  improved  micro 
processors  like  the  EA9002  and  the  Z80. 

Other  IC's  to  which  some,  but  not  much,  attention  should 
be  paid  include  the  FPLA  and  the  3 to  8 decoder.  The  latter  is 
useful  in  block  floating  point  operations  for  storing  exponents 
in  the  data  stream,  as  well  as  for  traditional  selection  tasks. 
The  FPLiA  may  have  some  applications  in  the  scaling  element,  but 
is  much  weaker  in  capability  than  Raytheon's  300  gate  array. 
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SECTION  V 
SOFTWARE 


Ih. 


5.1  Simulation  Objective 

The  simulation  activity  for  the  micrp  signal  processor 
is  based  on  a mixed  functional  and  register  level  model  of  the 
architecture.  The  result  is  useful  in  both  design  verification 
and  software  debug.  Under  this  study  the  assembler  and  simula- 
tors for  the  address  generator  and  the  sequencer  was  developed 
and  coupled.  Provisions  were  made  for  easy  extension  to  the  whole 
>*SP  under  the  next  phases  of  this  study. 

5.2  Philosophy 

The  simulation  is  designed  to  run  on  a CDC  Cyber  73. 

It  is  assumed  that  Fortran  IV  had  been  chosen  as  a standard  to 
ensure  the  transferability  of  the  simulator.  PMS  (Processor, 
Memory  Switch)  and  ISP  (Instruction  Set  Processor)  descriptions 
are  employed  to  be  certain  that  modelling  accurately  tracks 
the  computer  architecture.  It  is  assumed  that  the  FIFO  between 
the  Sequencer  and  ADGEN  (Address  Generation)  functions  acts  as  an 
exclusive  asynchronous  communication  buffer  i.e.  the  sequencer, 
and  ADGEN  portions  can  be  treated  as  independent  processors  with 
the  exception  of  the  I/O  protocal  via  the  FIFO.  In  terms  of  the 
control  structure  both  processors  are  slaved  to  the  state  of  the 
FIFO. 

Effort  v'as  focused  on  the  control  element  first  rather 
than  on  the  pipeline.  Pipeline  sumulation  is  straight  forward 
because  of  the  fixed  clock  count  nature  of  all  macro  execution. 

A functional  level  simulation  is  most  appropriate,  which  in  turn 
depends  on  defining  a large  collection  of  macro  instructions. 

More  insight  and  confidence  into  the  operation  of  this  (jSP  ap- 
proach is  gained  from  developing  the  control  simulation  first. 


i 


I 


161 


The  simulation  can  be  viewed  as  an  assembler/simulator  since 
a portion  of  the  simulator  will  be  an  assembler.  Two  uses  of  the 
system  can  be  envisioned  and  are  included  in  the  design  philosophy: 
1)  design  aid,  and  2)  trainer.  It's  use  as  a design  aid  is  ob- 
vious. It's  use  as  a trainer  resides  in  the  fact  that  it  is  an 
assembler/simulator  with  which  a user  can  practice  coding  prior 
to  equipment  availibility . 

The  software  architecture  is  rather  classical.  It  should  be 
noted  that  the  models  themselves  form  a small  portion  of  the 
system  whereas  the  rest  support  user  I/O  and  system  initializa- 
tion functions.  This  type  of  structure  permits  easy  modification 
of  either  the  model  or  user  portions  independently. 

All  logic  levels  in  this  simulation  effort  are  described  as 
0 or  1 (ie.  low  or  high)  rather  than  true  or  false  with  the  excep- 
tion of  TC-test  condition  - which  will  always  be  true=low. 

The  following  bit  numbering  convention  is  used: 
the  leftmost  bit  position  will  be  labelled  bit  0,  with  bit 
lables  increasing  monotonically  to  the  rightmost  bit-e.g. 


op 

i 

P 

address 

0 

3 

4 

5 11 

Negative  numbers  are  represented  in  two's  complement  form 
within  the  ^SP.  There  are  two  ways  in  which  negative  numbers 
can  occur.  First,  the  programmer  may  insert  them  into  a register 
from  the  quantity  field  by  a Sequencer-LQ  instruction,  and  second, 
repeated  execution  of  the  Sequencer-DR  instruction  will  eventually 
result  in  a negative  value. 

To  facilitate  recognizing  and  manipulating  negative  numbers 
in  two's  complement  form,  some  non-standard  Fortran  functions 
have  been  used,  (i.e.,  MASK  and  SHIFT,  and  user  defined  functions 
built  up  from  these) . Tnese  are  the  same  functions  which  are  used 
to  decide  instructions  and  to  implement  the  ISP  concatenat  function  , 
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as  described  in  reference  2.  It  has  previously  been  decided  that 
a simulated  uSP  word,  (of  whatever  length) , will  occupy  a full 
CDC  word.  Hence,  to  take  advantage  of  CDC  defined  relational 
operators  and  arithmetic,  the  sign  bit  of  the  /iSP  word  must  be 
propagated  to  the  left  to  fill  the  entire  60-bit  word.  Once  this 
has  been  done,  two's  complement  numbers  can  be  converted  to  one's 
complement,  any  arithmetic  function  may  be  performed  and  the 
result  converted  back  to  its  two's  complement  representation. 

5. 3 Orientation 

The  Micro  SP  (signal  processor)  architecture  can  simply  be 
viewed  as  two  dissimilar  programmable  processors  linked  by  a 
FIFO  buffer  acting  as  a one  way  asynchornous  communication  buffer 
(see  below) . 


FIFO 


Sequencer 


^ Oatapipe 


The  two  processors  are  referred  to  as  the  Sequencer  and  the 
Datapipe.  The  Datapipe  exercises  control  over  data  memories  and 
an  arithmetic  pipeline.  As  such  it  is  the  "number  cruncher" 
portion  of  the  SP . The  Sequencer  exercises  control  over  the  Data- 
pipe by  means  of  pointers  that  are  converted  and  used  as  program 
counters  within  the  Datapipe. 

The  Sequencer  is  master  with  respect  to  the  Datapipe  and  trains- 
fers  its  Datapipe  pointers  via  the  FIFO.  The  FIFO  is  required  to 
compensate  for  the  asynchronous  operation  of  the  two  units. 

When  the  FIFO  is  full  the  Sequencer  clock  is  inhibited.  When 
the  FIFO  is  empty  the  Datapipe  clock  is  inhibited. 

The  Datapipe  is  composed  of  two  major  parts  i.e.,  the 
ADRGEN  (Address  Generator)  which  controls  data  flow  and  the 
arithmetic  pipeline  which  processes  the  data.  Both  of  these  units 
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are  separately  programmable. 

The  Sequencer  code,  which  is  not  selfmodif iable,  resides  in 
Sequencer  Memory.  The  instruction  is  composed  to  two  major  seg- 
ments i.e..  Sequencer  control  and  Datapipe  control.  The  Sequencer 
control  portion  of  the  instruction  effects  the  next  address  control 
for  the  Sequencer  PC  (Program  Counter) . The  Datapipe  control 
portion  of  the  instruction  is  loaded  (subject  to  test  criteria) 
into  the  FIFO.  As  time  is  available,  the  Datapipe  reads  the  Data- 
pipe control  data  from  the  FIFO  and  distributes  the  two  pointers 
contained  in  it  to  the  ADRGEN  and  pipeline  respectively. 


Figure  65  illustrates  the  data  flow.  The  figure  is  hiahly 
simplified  and  is  presented  in  a way  that  would  be  most  tutorial 
for  the  assembler  effort. 


Fioure  65  - fxSP  Instruction  Data  Flow 
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Although  code  for  the  two  processors  is  independently  execu- 
table the  two  are  not  independent  from  the  user  viewpoint.  Since 
the  user  expects  to  refer  to  ADRGEN  and  macro  (i.e.  , pipeline) 
programs  by  name  rather  than  address  the  assembler  must  associate 
the  two  via  its  symbol  table.  Implementing  this  process  implies 
that  symbolic  entry  points  must  be  defined  for  the  Datapipe. 

From  the  user  standpoint  the  most  advantageous  way  of  doing  this 
is  to  be  able  to  freely  intersperse  the  code  for  the  two  processors 
into  a merged  but  logically  coherent  source  file.  The  tas)c  of 
the  assembler  then  would  be  to  separate  the  two  files,  assemble 
the  ADRGEN  code,  then  assemble  the  Sequencer  code. 

The  macro  (pipeline)  pointer  values  would  have  to  be  assigned 
by  the  user  in  the  Sequencer  program.  For  simulation  purposes, 
however,  the  pipeline  will  be  modelled  with  its  functions  included 
i.e.,  no  assembler  requiremente  exist  for  the  pipeline  at  this  time. 

5 . 4 Micro  Signal  Processor  Simulation 

The  purpose  of  the  micro  signal  processor  simulation  is  to 
simulate  the  execution  of  the  SP  resident  software.  This  sumula- 
tion  is  a mixed  functional  and  register  level  simulation  of  the 
micro  signal  processor  architecture  with  features  that  en- 
hance its  adaptability  for  reconfiguration  purposes.  The  simula- 
tion is  coded  in  FORTRAN  and  is  executable  on  the  Cyber  73  at 
Raytheon  in  Bedford. 

In  support  of  the  task  two  new  assemblers  were  developed: 
one  for  the  sequencer,  and  the  other  for  the  address  generator 
portion  for  the  SP.  These  assemblers  are  configured  in  such  a 
way  that  they  are  equally  useful  for  developing  object  code  for 
either  the  simulator  or  for  actual  hardware.  These  two  mutually 
independent  symbolic  cross  assemblers  permit  the  user  to  sped  fy 
the  source  code  in  a format  that,  by  field,  closely  parallels 
that  of  the  generated  object  code. 

1 
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The  simulation  portion  of  this  assembler/simulator  simulation 
system  accesses  the  object  code  files,  creates  instruction  memory 
images,  and  simulates  the  instruction  execution  of  the  two 
instruction  processors  (sequencer  and  address  generator)  subject 
to  the  effect  of  the  FIFO  interface  that  serves  as  an  asynchro- 
nous interface  buffer  between  the  two  devices.  The  user  exer- 
cises control  over  the  simulation  by  means  of  a user  run  stream 
that  includes  trace  and  debug  features.  They  permit  the  user  to 
1)  control  the  state  of  the  machine  (e.g.,  halt,  run),  2)  v;rite 
into  or  read  from  instruction  memories,  and  3)  trace  critical 
registers  during  program  execution. 


5. 5 Program  Organization 

The  micro  signal  processor  simulation  consists  of  three  se- 
parate programs  that  operate  under  the  control  of  the  Cyber  73  NOS 
operating  system.  The  three  programs  are:  the  sequencer  assembler, 
the  ADRGEN  (address  generator)  assembler,  and  the  simulator  (se- 
quencer/FIFO/ADRGEN/  instruction  set  processor) . 

The  sequencer  assembler  accepts  source  code  (as  shown-in 
Figure  65  and  generates  object  code  (identical  in  format  to 
that  for  the  sequencer  instruction  memory)  which  is  placed  on  a 
CDC  file.  The  ADRGEN  assembler  accepts  source  code  (see  Figure 
65)  and  generates  object  code  (identical  in  format  to  that  for 
the  address  generator  memory)  which  is  also  places  on  a CDC  file. 

The  two  files  are  input  files  to  the  simulation  program.  The  out- 
put of  the  simulator  is  a description  of  critical  register  states. 

Figure  66  illustrates  the  simulation  system  flow.  Figure 
67  shows  the  hierarchy  diagram  for  the  simulator  itself. 


The  ^SP  simulation  system  requires  the  sequential  op- 
eration of  three  programs;  the  ADRGEN  assembler,  the  sequencer 
assembler,  and  the  simulator.  There  is  no  co- residency  require- 
ment. Only  the  output  file  of  the  assemblers  need  be  preserved 
for  the  simulator. 
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Figure  66  - Micro^tSP  Simulation  System  Flow  Diagram 


MICRO-Sf 
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Figure  67  - Hierarchy  Diagram 
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5. 6 Software  Development  Status 


The  software  for  the  micro  signal  processor  simulation  was 
developed  using  Bedford  Laboratories  procedures  for  software  de- 
velopment. The  requirements  for  the  simulation  were  specified  in 
Instruction  Set  Processor  (ISP)  format  for  clarity.  These  de- 
scriptions spanned  the  sequencer,  FIFO,  and  address  generator 
portion  of  the  system.  They  did  not  cover  the  pipeline  functions 
or  structures. 

The  simulation  spans  events  from  source  assemblv  to  MAR  out- 
puts. The  pipeline  functions  are  not  incorporated. . The  trace/ 
debug  functions  are  incomplete  as  are  the  data  reduction  functions. 

Extended  FORTRAN  features  of  the  Cyber  73  system  were  used 
to  facilitate  code  development.  The  areas  of  such  usage  are  well 
defined  and  subsequently  may  be  subjected  to  standardization. 

Testing  of  all  programs  was  conducted  as  generated.  A 
final  acceptance  test  consisted  of  the  assembly  and  execution  of 
an  FFT  (Fast  Fourier  Transform)  program  which  is  described  later. 
The  results  of  this  benchmark  were  verified  against  anticipated 
register  states  (e.g.,  memory  address  registers)  and  f'jund  to  be 
consistent  with  hardware  design. 

After  the  programs  had  been  completely  checked  out  the  pro- 
gram configuration  control  items,  deck  and  listings,  were  placed 
in  the  Digital  Systems  Laboratory  Program  Library.  Final  documa- 
tion  consists  of  a simulation  user's  manual  - Raytheon  BR-9632. 
These  specifications  contain  all  of  the  information  necessary 
for  the  use  of  the  complete  program.  A copy  of  the  program  list- 
ing is  enclosed  as  an  appendix  to  that  report. 

5 . 7 Future  Development 

The  design  of  the  simulator  includes  features  which  anticipate 
the  future  incorporation  of  trace/debug,  statistics,  and  report 
generation  options.  Another  improvement  is  to  link  the  two  assem- 
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biers  so  that  sequence  instruction  can  refer  to  ADRGEN  entry 
points  by  name.  The  simulator  currently  has  no  arithmetic  capa- 
bility at  the  pipeline  level.  This  should  be  done  to  prove  out 
the  correctness  of  macro  prograuns  (in  an  arithmetic  sense).  Data 
can  be  externally  generated  by  a user  program  and  loaded  via  a 
data  memory  loader.  This  feature  would  be  a desirable  adjunct 
to  pipeline  testing.  The  trace/debug  features  permit  a wide  var- 
iety of  via  time  checkout  of  intermediate  data.  The  statistical 
data  reduction  would  require  the  implementation  of  the  data  col- 
lection and  reduction  function  library.  The  report  generation 
could  be  as  elaborate  as  utility  dictates. 


170 


SECTION  VI 

CIRCUIT  TECHNOLOGY 


6.1'  Logic  Technology  Choices 

The  status  of  today's  contending  circuit  technologies  was 
surveyed  to  aid  in  choosing  the  implementation  directions  for  this 
^SP  for  up  to  the  next  five  years.  This  review  is  based  primar- 
ily on  information  gleaned  from,  the  public  literature.  Although 
this  does  not  yield  the  latest,  hottest  things  in  progress  in  the 
private  corners  of  semiconductor  laboratories,  it  is  within  the 
stl f-interest  of  makers  of  the  latest  circuit  technology  to  drop 
hints  as  to  their  direction  and  expectations.  Moreover,  due  to 
the  high  capitalization  factor  of  the  semiconductor  industry  there 
is  a strong  and  almost  overwhelming  impetus  to  push  the  process 
presently  being  worked  to  its  limits  before  considering  the  jump 
to  a drastically  different  process. 

A strong  bandwagon  effect  exists  to  follow  the  mainstream 
even  if  it  is  not  as  good  in  certain  aspects  as  other  logic  tech- 
nologies, just  because  everyone  else  does  it  that  way.  Reasons 
for  this  "follow  the  leader"  tendency  include  customer  acceptance, 
the  need  for  background  products  to  pay  for  new  developments,  re- 
liance on  the  same  information  sources,  and  strong  mobility  of 
professionals  between  companies,  just  to  mention  a few. 

Considerable  filtering  was  done  on  these  public  sources  of 
information  to  try  to  separate  the  truth  from  market  testing  and 
puffery.  Still,  as  an  industry,  we  talk  a lot  about  how  good  we 
do,  including  freely  publicizing  the  latest  chip  masks  and  process 
outlines.  This  is  probably  because  such  things  are  no  secret 
anyway  once  the  first  chip  gets  sold. 

Figure  68  and  69  present  the  data  tor  this  circuit  review. 
Note  the  distinction  between  approaches  in  production  and  those 
in  development.  The  fact  that  a new  process  is  in  production 
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doesn't  necessarily  mean  that  it's  technical  value  is  obsolete 
when  looking  forward  over  the  next  several  years.  Higher  resolu- 
tion lithography  techniques  alone  are  likely  to  add  years,  or  more 

service  to  today's  "workhorses”. 

. 2 

A key  parameter  is  the  packing  density  in  gate/mm  . The 

spread  in  packing  density  between  what  is  in  high  volume  production 
and  what  is  experimental  is  about  an  order  of  magnitude.  Not  all 
that  density  improvement  can  be  realized  due  to  chip  partitioning 
problems  and  I/O  requirements.  Thus,  those  fiSP  element  design 
parts  considered  for  single  chip  implementation  should  be  subject 
to  a marketing  analysis  of  risk,  expense,  and  payoff.  The  payoff 
which  justifies  pursuing  a higher  packing  density  must  be  signifi- 
cant improvements  in  parameters  such  as  power,  speed,  chip  totals 
and/or  fabrication  ease. 

Some  general  statements  can  be  made  about  the  most  useful  as- 
pects of  each  contending  circuit  technology.  LSTTL  is  well  known, 

has  a good  speed-power  product  and  is  the  industry  workhorse. 

2 

I L will  provide  considerably  higher  density,  at  slower  speed,  and 
probably  with  TTL  interface  levels  most  of  the  time.  CMOS-SOS 
provides  radiation  hardness  and  low  power  for  equivalent  speeds  of 
LSTTL.  ECL  gives  at  least  a factor  of  two  more  speed  than  TTL, 
although  the  higher  power  densities  require  special  cooling  con- 
siderations. NMOS  is  moving  up  in  speed,  with  variations  like 
DMOS  providing  even  better  process  control 

We  conclude  that  with  such  a choice  available,  no  one  tech- 
nology is  absolutly  superior.  A few  complementry  technological 
points  should  therefore  be  chosen  to  cover  the  full  range  of  needs. 
For  example,  to  encourage  immediate  applications  of  the  (iSP  a ver- 
sion based  on  available  LSI  building  blocks,  namely  LSTTL,  is 
envisioned.  For  hybrid  packaging,  where  power  limitations  may  be 
more  significant  than  chip  count,  CMOS-SOS  is  the  most  viable 
candidate.  Both  will  use  clock  cycles  in  the  100  - 150  ns  region. 
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depending  on  desired  operating  voltages  and  temperature  margins. 

An  ECL  type  version  has  some  attractiveness  for  the  future  as 
applications  push  toward  greater  performance  needs.  An  ECL  ver- 
sion, however,  will  probably  use  twice  as  many  parts  as  a CMOS/ 

SOS  version,  assuming  the  present  technology  trends  continue. 

6 . 2 Gate  Arrays 

Considerable  interest  has  been  generated  recently  in  gate 

array  approaches  to  LSI.  Among  the  latest  rumors  are  a 2000 
2 

gate  I L array  being  postulated  by  Signetics.  Consequently  we 
have  made  a survey  of  companies  mentioned  in  public  literature 
as  having  a gate  array  capability.  The  results  of  this  survey 
are  presented  in  Figure  70  and  71. 

We  draw  significant  insight  from  hands-on  experience  at  out 
Microelectronic  Facility.  There,  high  speed  gate  arrrays  were 
developed  which  closely  follow  the  packaging  density,  integration 
level  and  reliability  of  state-of-the-art  custom  LSI  circuits, 
but  which  can  be  personalized  on  the  final  interconnect  levels 
at  low  development  costs.  Such  arrays  can  be  manufactured  economi- 
cally in  very  small  production  quantities  of  each  personalizations, 
have  short  one  month  development  cycle,  and  have  very  predictable 
performance.  They  can  replace  all  the  logic  in  a system  with  per- 
sonalized LSI,  but  are  especially  desirable  when  mixed  with  the 
best  off-the-shelf  LSI  devices  to  iliminate  any  SSI  or  MSI  logic. 

Arrays  based  on  Schottky  TTL  circuits  have  been  developed  with 
different  speeds  and  power  dissipation  levels,  all  providing  high 
speed  and  good  drive  over  the  military  temperature  range.  Com- 
plexity has  increased  from  24  gates  per  array  in  1968  to  the  pre- 
sent 300  gates  with  5 nsec  delay,  with  100  gate  arrays  in  advan- 
ced  development.  Recently  a 1600  transistor  CMOS/SOS  array  has  j 

been  developed,  having  faster  speed  and  lower  power  than  the  TTL  j 

version.  A 5000  transistor  array  is  under  development,  givjng  j 

roughly  1000  gate  capability.  i 
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Figure  71  - Gate  Array  Survey  II 


The  two  key  concepts  that  make  this  array  approach  a via- 
ble candidate  for  fiSP  logic,  and  not  a technology  plaything  , are 
computed  aided  design  (CAD)  for  mask  generation  and  con^atibility 
of  the  array  fabrication  with  an  outside  manufacturer's  STTL  or 
CMOS /SOS  line.  The  CAD  system  is  particularly  cost-effective  be- 
cause experience  with  the  layout  of  over  200  complex  MSI/LSI 
devices  and  over  100  hybrid  circuits  was  shown  that  high  yields 
can  be  achieved.  Such  yields  allow  confident  estimates  of  hard- 
ware implementation  and  check-out  time. 

We  conclude  that  the  most  worthwhile  gate  array  candidates 

today  are  the  CMOS /SOS  and  for  higher  speed  at  lower  density, 

2 

some  variation  of  ECL.  I L is  a likely  candidate  for  future 
years,  in  that  much  greater  density  may  be  achieved  at  some  speed 
loss . 

6. 3 High  Density  Packaging 

2 

A fundamental  packaging  problem  is  the  1 in.  occupied  by  a 
16  pin  DIP  costing  between  $0.5  to  more  than  $10.0  as  a function 
of  the  number  of  printed  circuit  layers  and  production  volume. 
Partial  solutions  include: 

• Motorola's  QUAD- IN-LINE  package  cuts  40  to  64  pin 
DIPS  to  almost  half  card  area 

• Flatpacks  take  1/2  to  2/3  volume  of  DIPS,  but  are  not 
the  current  industry  standard 

• Increased  signal  multiplexing  keep  pin  totals  to 
small  DIPS  but  also  increases  design  complexity 

The  most  promising  approach  in  our  opinion  is  using  chips 
on  thick  film  ceramic  substrates,  e.g.,  hybrid  packaging.  For 
comparable  production  volumes  and  complexities  to  the  above  printed 
circuit  approach,  a multilayer  ceramic  substrate  cost  varies  from 
2 to  50  dollars  per  square  inch.  Now  component  mounting  den- 
sities are  limited  primarily  by  power  dissipation.  Thus  for  a 

2 

reasonable  mix  of  MOS  and  bipolar  devices,  a 20  components/in. 
is  plausible.  The  cost  for  mounting  each  device  in  a hybrid  is 
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then  between  $0.10  and  $2.50,  an  order  of  magnitude  price  improve- 
ment. 

Total  system  costs  can  thus  be  drastically  lower  with  the  in- 
telligent use  of  hybrid  packaging.  The  components  are  purchased 
properly  tested,  but  without  extra  packaging,  and  are  thus  in- 
herently less  expensive  than  flatpacks  of  DIPs.  The  key  savings 
are  in  the  reduced  number  of  modules  required  to  do  the  total 
job  because  of  the  higher  packaging  density  achieved. 

Reliability  should  be  significantly  greater  with  hybrid 
packaging.  The  large  number  of  solder  connections  from  signal  ! 

chip  carriers  (eg.  DIP  or  flatpack)  to  printed  circuit  board  are 
replaced  by  a very  much  smaller  number  of  connections  from  hybrid 
substrate  to  circuit  board. 

Figure  72  illustrates  the  expected  size  and  production  cost 
tradeoffs.  Packaging  approaches  vary  from  the  extreme  of  all 
commercial  DIPs  to  the  most  promising  combination  of  personalized 
gate  arrays  with  commercial  LSI  together  on  ceramic  substrated 


j 
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Figure  72  - System  Packaging  Size  and  Cost 
The  tradeoff  thus  exists  between  pushing  density  on  one  chip 
verse  multi-chip  hybrid  packaging.  The  latter  involves. lower 
risk  than  pushing  chip  technology,  but  the  former  has  histor- 
cally  had  greater  payoff. 
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SECTION  VII 

DEVELOPMENT  PLAN 


7 . 1 Background 

The  High  speed  Micro  Signal  Processor  study  resulted  in  a 
definition  of  general  specifications  for  a set  of  micro  signal 
processor  functional  elements.  These  elements  form  a modular, 
expemdable  basis  for  a spectrum  of  avionics  signal  processor 
configurations.  The  study  was  concerned  with  architectural 
plemning  and  functional  partitioning  to  maximize  the  use  of  main- 
stream commercial  microprocessor  devices  and/or  LSI  array  type 

I 

building  blocks  without  restricting  the  design  implementation. 

A functional  level  simulator  design  was  initiated  and  partially 
completed.  Documentation  includes  an  Instruction  Set  Processor 
(ISP)  definition  of  the  modelled  elements  along  with  the  code 
and  user  manuals. 

7.2  Objective 

The  micro  signal  processor  (/i.SP)  development  plan  is  designed 
to  verify  the  ^SP  baseline  design  and  evaluate  the  advanced  de- 
vice development  to  support  and  implement  the  design.  These  ob- 
jectives shall  be  attained  in  discrete  measurable  steps.  A low 
risk  processor  implementation  to  establish  benchmark  data  and 
provide  an  advanced  device  test  fixture;  device  development  pro- 
gram to  develop  chips  not  expected  to  be  available  in  commercial 
markets;  full  processor  implementation  in  advanced  technology; 
and  verification  of  device  development  program.  The  program  has 
stressed  technology  independence  in  implementation . Approa- 
ches that  maximize  this  philosophy  are  desirable  features  and 
will  be  viewed  favorably  by  the  evaluators. 

.3  Program  Discussion 

The  contractor's  proposal  fulfilling  the  objectives  of  the 
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development  plan  must  provide  AFAL  with  a usable  product  at  each 
phase  of  subsequent  development,  as  outlined  in  figure  73. 

The  initial  step  is  a technology  verification  model  to  pro- 
vide AFAL  with  three  capabilities: 

1.  A desk-top  micro  signal  processor  which  can 
interface  into  commercial  ^'s  or  existing  AF 
base  computing  facilities. 

2.  A test  fixture  and  development  tool  to  test  and 
validate  subsequent  chip  development  capabilities 
in  subsystem  use. 

3.  A firmware/software  development  tool  to  develop 
and  test  algorithms  eind  develop  software  for  the 
advanced  technology  model. 

The  second  step  is  the  chip  development  of  critical  func- 
tions. These  chips  will  be  functionally  compatible  with  the 
technology  verification  model  and  can  replace  the  commercial 
functions  allowing  AFAL  to  measure  the  advanced  device  perfor- 
mance . 

The  third  step  is  the  development  of  the  remaining  chips 
required  to  complete  the  advanced  technology  model.  When  each 
chip  is  developed,  it  can  be  placed  into  the  commercial  version 
to  demonstrate  the  viability  of  each  chip. 

7.4  Statement  of  Work 

The  requirements  to  finalize  design,  fabricate  and  test  the 
High  Speed  fxSP  consists  of  three  phases.  Task  descriptions  by 
phase  are  shown  below?  schedules  for  each  phase  is  shown  in 
figure  74.  The  contractor  is  to  provide  personnel,  materials, 
and  facilities  with  the  objectives  to  complete  the  following 
development  and  demonstration  tasks.  (Outlined  in  figure  75)  . 
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Figure  75  - High  Speed  /xSP  Development  Plan 


7.5  Phase  Ila  - Technology  Verification  Model  (TVM) 


A detailed  design  shall  be  implemented  using  currently 
available  commercial  ftP  elements.  The  fiSP  logic  design  should 
be  verified  by  simulation  and  the  hardware  implementation  sup- 
ported with  suitable  firmware  programs  to  support  subsequent 
development  and  testing. 


1)  The  contractor  shall  provide  detailed  design  speci- 
fications for  implementing  the  /iSP  conforming  to  the 
requirements  definition  outlined  in  section  one. 

2)  The  design  specification  shall  include  a specifica- 
tion outlining  the  requirements  for  firmware.  The 
contractor  shall  demonstrate,  in  appropriate  se- 
quences, the  capability  to  process  the  following 
algorithms  for  test  purposes. 

FFT  with  input  weights 
FIR  Filter 
HR  Filter 

Magnitude  & CFAR  Thresholding 
EW  Signal  Sort 
2 Dimensional  Correletion 

This  specification  shall  be  the  subject  of  a design 
review  at  the  end  of  month  three. 

3)  The  contractor  shall  implement  the  approved  design 
specification  using  CAD  systems  technology  and  verify- 
ing the  designs  with  the  previously  developed  simula- 
tion programs  (descriptions  provided  with  the  RFP) . 
This  stage  shall  verify  that  the  processor  detailed 
design  performs  as  required;  when  such  verification 

is  established,  the  contractor  will  fabricate  the 
technology  verification  model. 
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4)  The  fabricated  models  will  be  tested  using  the  firm- 
ware and  test  procedures  developed  in  the  design 
specification.  At  the  conclusion  of  the  develop- 
ment tests,  an  official  acceptance  test  will  be  con- 
ducted at  AFAL  facilities  in  Dayton,  OH. 

5)  The  TVM  shall  be  used  to  evaluate  and  demonstrate 
the  viability  of  subsequent  chip  developments  in 
Phase  Ilb  and  Phase  III.  When  these  chips  are  de- 
veloped, they  will  be  substituted  into  the  TVM,  re- 
placing the  commercial  logic,  and  run  with  the  pre- 
viously developed  algorithms. 

6)  A Phase  Ila  report  shall  be  prepared  and  will  include 
the  specifications,  the  detailed  design  data,  block 
diagrams,  parts  list,  and  description  of  firmware 
and  procedures  for  developing  subsequent  software 
algorithms . 

7.6  Phase  lib  - Critical  Device  Development  (CDD) 

These  ^SP  functional  elements  should  be  implemented  in  the 
desired  technology  and  subsequently  integrated  into  the  /aSP 
point  design  replacing  its  commercial  counterpart.  This  test 
will  provide  incremental  verification  that  the  LSI  implementation 
is  functional  identical  to  the  commercial  design. 

Critical  Devices  will  be  built,  superseding  equivalent  com- 
mercial logic  in  the  TVM.  Estimated  cost  are  shown  in  figure 

76. 

1)  The  contractor  shall  design,  build,  and  test  4 chips 
from  the  list  of  candidate  critical  devices  shown  in 
section  1.  In  the  course  of  this  development,  the 
contractor  shall  develop  a functional  design  specifi- 
cation for  this  device  including  device  test  and 
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COMMERCIAL  DEMONSTRATOR  (2  UNITS) 


evaluation  data. 


2)  The  contractor  shall  implement  a detailed  design  of 
the  chips  upon  approval  of  the  design  specification 
by  AFAL.  Array  technology  is  recommended  for  this 
implementation;  to  enhance  the  probability  of  eval- 
uating the  technology  design  in  the  TVM. 

3)  The  contractor  shall  demonstrate  the  critical  device 
capability  by  substituting  these  devices  into  the 
TVM  and  running  the  baseline  algorithms.  Relative 
benchmark  comparisons  shall  be  normalized  to  compen- 
sate for  attenuation  by  miscellaneous  logic  needed 
to  make  the  TVM  interface. 

4)  A Phase  lib  final  report  will  be  prepared  outlin- 
ing the  conclusions,  tast  data  evaluation,  chip  de- 
signs, and  other  appropriate  technical  data. 

7.7  Phase  III  - mSP  Advanced  Technology  Model  (ATM) 

The  remaining  functional  elements  will  be  designed  and  veri- 
fied in  a similar  fashion  as  in  the  TVM  phase.  Supporting  de- 
vice characterization  tests  will  also  be  performed  to  fully  de- 
scribe the  devices. 

The  uSP  functional  units  consist  of  arithmetic,  control  and 
memory  elements. 

The  contractor  shall  identify  the  remaining  devices 
necessary  to  complete  the  elements  of  an  advanced 
technology  model.  This  list  of  devices  shall  include 
the  list  provided  in  Task  II  so  AFAL  may  evaluate 
the  priority  of  implementation. 

2)  The  contractor  shall  design,  build,  and  test  a com- 
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plete  ATM  using  the  selected  advanced  technology, 
using  an  AFAL  approved  specification,  in  either  cus- 
tomer or  gate  array  implementation.  The  contrac- 
tor shall  demonstrate  how  he  intends  to  complete 
customer  designs  within  the  timetable  available  for 
this  phase. 

3)  The  contractor  shall  develop  a test/integration  pro- 
cedure for  testing  the  chips  as  devices  and  system 
elements.  The  procedure  shall  allow  for  substituting 
these  functional  elements  into  the  TVM  and  measuring 
the  performance.  The  procedure  shall  insure  that 
sufficient  combinations  of  the  functional  elements 
are  tested  in  the  TVM  before  the  fjSP  ATM  is  tested 

as  a standalone  unit. 

4)  Upon  completion  of  the  hardware,  system  tests  using 
the  previously  developed  algorithm  shall  be  conduc- 
ted to  measure  the  relative  increase  in  performance 
in  this  phase. 

7.8  Phase  IV  - Application  Verification  Phase  (AVP) 

Application  firmware  developed  by  AFAL/Contractor  will  be 
used  to  demonstrate  the  /rSP  ATM  during  a four-month  demonstration 
phase . 

Code  for  AFAL  selected  missions  will  be  developed  and  demon- 
strated as  required.  This  task  is  intended  as  a level  of  effort 
software  support  phase.  Estimated  cost  for  Phase  III  and  for 
Phase  IV  are  shown  in  figure  77. 
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f’ioure  77  - Phase  III  - Advanced  Technology  Models 


APPENDIX  A 

CONTRACT  REQUIREMENTS 
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1.  Contract  Requirements 


The  microsignal  processor  program  statement  of  work  and  Ray- 
theon's fulfillment  through  schedule  and  cost  are  summarized 
herein. 

1. 1 Summary  of  Work  Requirements 
Tasks/Requirements 

In  achieving  the  objectives  of  this  program  the  contractor 
shall  accomplish  the  following  tasks; 


1.1.1  Functional  Analysis;  Functional  requirements 
generic  to  airborne  signal  processing  applications  shall  be  de- 
fined. 

Consideration  shall  be  given  to  signal  processing 
tasks  inherent  to  radar,  both  air-to-air  and  air-to-ground  modes 
including  synthetic  aperture  groxand  map,  electronic  warfare,  sig- 
nal sorting  and  classification  and  communications,  image/wave- 
form coding  and  decoding.  Algorithm  classes  to  be  considered  and 
characterized  shall  include,  but  are  not  limited  to; 

• Digital  Fourier  transforms  and  inverse  transforms 

to  2u48  points. 

• Digital  filters,  recursive  and  nonrecursive. 

• Weighting  functions,  cosine  squared,  Taylor, 

Hanning,  etc. 

• Correlations,  serial  and  parallel,  various  levels/ 

combinations  of  source  and  reference  signals. 

• Walsh  functions,  Hadamard  transforms,  and  related 

waveform/image  coding  transformations. 

• Adaptive  predictive  coding,  data  bandwidth  com- 

pression techniques. 
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• Nested  polynomial  functions,  look  up  tables,  etc, 

for  common  arithmetic  computation. 

• Integration,  Averaging,  and  Stamdard  Deviation 

• Coodinate  conversion 

Processing  tasks  shall  be  tabulated  with  respect  to  typical  per- 
formance parcuneters  required  of  the  signal  processor  in  areas  of: 

• Word  size  - range  and  modularity 

• Fixed,  floating  point  arithmetic 

• Operation  mix  - arithmetic/logic/control 

• Processing  rates  - operations  per  second 

• Input/Output  - operations,  rates,  formats 

• Memory  - size,  organization 

• Data  handling  requirements 

• Environmental/engineering  constraints  - size, 

weight,  cooling,  temperature,  etc. 

Performance  ranges  and  levels,  degrees  of  modularity  and  common- 
ality of  architecture/design  features  for  a progrcunmable  signal 
processor  shall  be  assessed. 

1. 2 Performance  Analysis 

From  the  data  base  developed  under  Task  1.1.1,  a set  of 
signal  processing  tasks  (minimum  of  4)  shall  be  selected  as  a base 
reference  for  detailed  performance  requirements  analysis.  Inso- 
far as  practical,  these  tasks  shall  be  representative  of  the  com- 
plete spectrum  of  airborne  signal  processing  requirements  which 
are  anticipated  to  be  characteristic  of  platforms  and  avionic 
subsystems  for  the  1980-1985  era.  Typical  examples  of  applica- 
tion tasks  include: 

• Synthetic  Aperture  Ground  Map 

• Fast/Slow  Moving  Target  Detection 
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• Terrain  Following/ Avoidance 

• Ground  Moving  Target  Track 

• Air-to-Air  Search  and  Track 

• A/G,  A/ A Weapon  Delivery 

• RF  Signal  Sorting  and  Classification 

• Image  Compress ion/ Jam  Resistant  Transmission 

The  baseline  processing  tasks  shall  be  einalyzed  with  re- 
spect to  mathematical  theory,  task  partitioning,  and  flow  charting. 
Loading  characteristics  associated  with  subtasks  of  the  computa- 
tional decompositions  and  subtask  interrelationships  shall  be 
described  in  detail.  Critical  subtasks  shall  be  identified. 
Consistency  of  computational  requirements  across  the  tasks  will 
be  assessed.  Candidate  architectural  features,  hardware  design 
considerations,  and  related  software  elements  to  structure  a 
modularly  expandable,  programmable  signal  processor  to  accomplish 
the  signal  processing  functions  shall  be  identified. 

Signal  processing  tasks  to  be  considered  under  this  Task 
shall  be  subject  to  prior  approval  of  the  Air  Force  project  en- 
gineer. Candidate  task  listings  shall  be  proposed  by  the  con- 
tractor at  an  initial  program  review  to  be  conducted  at  AFAL  as 
soon  as  practical  for  work  progress,  but  no  later  than  60  days  af- 
ter contract  start. 

1 . 3 State-of-the-Art  Review 

The  industry  LSI  microprocessor  technology  base  shall  be 
reviewed.  Existing  state-of-the-art  and  projected  developments 
over  the  next  five  (5)  years  shall  be  addressed  for  such  elements 


• Memories  (RAM,  ROM,  PROM,  EAROM) 

• Interface  Circuits 

• Software/Firmware 

• Design /Development  Support  Tools 

The  applicability  and  availability  of  these  elements  to 
signal  processing  tasks  in  general,  cmd  specifically  to  the  base- 
line processing  tasks  of  Task  1.2  shall  be  assessed  for  the  de- 
velopment time  freime  identified  in  Task  1.7. 

1. 4 Functional  Definition  of  Elements 

Top  level  functional  specifications  shall  be  developed 
for  a set  of  hardware  micro-signal-processor  circuit  elements 
which  form  a basis  for  modularly  configured,  programmable  signal 
processing  capabilities  indicated  by  the  results  of  Tasks  1.1, 

1.2.  The  analysis  shall  consider  implementations  necessary  for 
operation  and  support  of  the  micro-signal-processor  elements. 
Functional  descriptions  shall  address,  but  are  not  limited  to,  the 
following: 


• Hardware  Elements 
element  architecture  (general) 
performance  (function,  speed,  word  size) 
internal  data  paths/flow,  control  points 
interfaces  (I/O,  data,  control) 
phys.ical/environmental  characteristics 

• Processor  Configurations 
processor  architecture/modularity 
I/O,  control  interfaces 

memory  (capacity,  segmentation,  organ! zatiori) 
interelement  data-control  flows 
performance  capability  (thruput) 
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Software 


• 

control  word  formats  (macro/micro) 

control  algorithms  and  sequences 

software  elements 

Icinguage 

macro  library 

I/O  software 

sequence  control 

macro/micro  simulator 

macro/micro  assembler 


Major  tradeoff  rationales  (cost,  performance,  reliability, 
maintainability,  etc. ) to  support  the  hardware  - software  spec- 
ifications shall  be  identified. 

1. 5 Simulation 

The  hardware  - software  elements/approaches  identified  in 
Task  1.4  shall  be  simulated  to  verify  applicability  to  the  base- 
line processing  tasks  of  task  1.2.  The  simulation  is  to  address 
the  ability  of  the  basic  architecture  and  instruction  repertoire 
to  achieve  the  desired  signal  processing  functions  rather  than 
on  technology  implementation.  A discrete  event  simulator  shall 
be  developed  and  used  for  the  demonstration-evaluation  of  the 
modularity  and  expandability  features  of  the  micro-signal-proces- 
sing elements  for  the  system  environment.  The  simulator  shall 
be  written  in  Fortran  IV. 

1 

1.6  Circuit  Technology  Review  i 

Status  and  characteristics  of  integrated  circuit  technolo-  j 

gies  will  be  reviewed  to  determine  their  feasibility/suitability  ' 

4 

for  LSI  implementation  of  the  elements  defined  in  Task  1.4.  Pro- 
jected availability  considerations  will  be  addressed  in  addition 

to  fundamental  performance  capabilities.  Recommendations  shall  ' 
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be  supported  by  tradeoff  analysis. 


1. 7 Development  Plan 

A plan  for  the  hardware  and  software  implementation  and 
performance  demonstration  to  the  micro-signal-processing  elements 
shall  be  defined.  The  plan  shall  address  task  breakout,  schedules 
and  block  funding  requirements  consistent  with; 

I.  Requirements/Definition  Phase;  7 months  (this 

program) 

II.  Implementation  Phase;  24  months 

III.  Performance  Verification:  6 months 

Potential  advantages  of  accelerated  and/or  over  lapped 
Phase  II,  III  developments  shall  be  assessed  for  the  processor 
implementation  approach  developed  under  Tasks  1.3  - 1.6. 

1. 8 Program  Reviews 

The  contractor  shall  conduct  three  program  reviews. 

These  reviews  will  be  held  at  the  AF  Avionics  Laboratory  with  re- 
view agendas  to  include  briefings  on  all  task  progress  to  data 
and  specific  plans  for  the  ensuing  reporting  period. 

1. 9 Cost  & Schedule  Performance 

Raytheon's  proposed  schedule  for  the  seven  month  program 
is  shown  in  Figure  1.1;  all  tasks  were  completed  on  schedule,  re- 
ports and  presentations  varied  by  a few  weeks. 

• Tasks  1 and  2,  (Functional  and  Performance  Analysis) 
overlap  because  functions  requirements  functions 
become  an  iterative  loop  in  the  proposed  tops-down 
analysis . 
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Task  3,  state-of-the-art  review,  resulted  in  collection 
and  collation  of  data  to  determine  industry  trends. 

Task  4,  Functional  Definition  of  Elements,  was  developed 
in  parallel  with  the  element  definition. 

Task  5,  Simulation.  The  simulator  was  developed  in  pa- 
rallel with  the  element  definition. 

Task  6,  Circuit  Technology  Review  was  developed  from  lit- 
erature search  and  used  for  Phase  II  and  Phase  III 
recommendations . 
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